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Abstract In this work we establish and investigate connections between causes 
for query answers in databases, database repairs with respect to denial con¬ 
straints, and consistency-based diagnosis. The first two are relatively new re¬ 
search areas in databases, and the third one is an established subject in knowl¬ 
edge representation. We show how to obtain database repairs from causes, and 
the other way around. Causality problems are formulated as diagnosis prob¬ 
lems, and the diagnoses provide causes and their responsibilities. The vast 
body of research on database repairs can be applied to the newer problems 
of computing actual causes for query answers and their responsibilities. These 
connections are interesting per se. They also allow us, after a transition in¬ 
spired by consistency-based diagnosis to computational problems on hitting- 
sets and vertex covers in hypergraphs, to obtain several new algorithmic and 
complexity results for database causality. 

Keywords causality • diagnosis ■ repairs • consistent query answering ■ 
integrity constraints 


1 Introduction 

When querying a database, a user may not always obtain the expected results, 
and the system could provide some explanations. They could be useful to 
further understand the data or check if the query is the intended one. Actually, 
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the notion of explanation for a query result was introduced in im, on the basis 
of the deeper concept of actual causation^ 

A tuple t is an actual cause for an answer d to a conjunctive query Q 
from a relational database instance D if there is a contingent set of tuples A, 
such that, after removing F from D, a is still an answer, but after further 
removing t from D \ F, a is not an answer anymore (cf. Section 12.11 for a 
precise definition). Here, A is a set of tuples that has to accompany t so that 
the latter becomes a counterfactual cause for answer a. Actual causes and 
contingent tuples are restricted to be among a pre-specified set of endogenous 
tuples, which are admissible, possible candidates for causes, as opposed to 
exogenous tuples, which may also be present in the database. In rest of this 
paper, whenever we simply say “cause”, we mean “actual cause”. 

In applications involving large data sets, it is crucial to rank potential 
causes by their responsibilities which reflect the relative (quantitative) 

degrees of their causality for a query result. The responsibility measure for a 
cause is based on its contingency sets: the smallest (one of) its contingency 
sets, the strongest it is as a cause. 

Actual causation, as used in [47], can be traced back to [^1^ . which 
provides a model-based account of causation on the basis of counterfactual 
dependence. Causal responsibility was introduced in [H] , to provide a graded, 
quantitative notion of causality when multiple causes may over-determine an 
outcome. 

Apart from the explicit use of causality, research on explanations for query 
results has focused mainly, and rather implicitly, on provenance [lainiiini 
nmniiMMi]. A close connection between causality and provenance has been 
established in m- However, causality is a more refined notion that identifies 
causes for query results on the basis of user-defined criteria, and ranks causes 
according to their responsibilities SB). 

Consistency-based diagnosis [53], a form of model-based diagnosis (BOj sec. 
10.3], is an area of knowledge representation. The problem here is, given the 
specification of a system in some logical formalism and a usually unexpected 
observation about the system, to obtain explanations for the observation, in 
the form of a diagnosis for the unintended behavior (cf. Section l^BTl for a precise 
definition). 

In a different direction, a database instance, D, that is expected to satisfy 
certain integrity constraints may fail to do so. In this case, a repair of D 
is a database D' that does satisfy the integrity constraints and minimally 
departs from D. Different forms of minimality can be applied and investigated. 
A consistent answer to a query from D and with respect to the integrity 
constraints is a query answer that is obtained from all possible repairs, i.e. 


^ In contrast with general causal claims, such as “smoking causes cancer”, which refer 
some sort of related events, actual causation specifies a particular instantiation of a causal 
relationship, e.g., “Joe’s smoking is a cause for his cancer”. 
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is invariant or certain under the class of repairs (cf. Section 12.21 for a precise 
definition). These notions were introduced in [2] (see [TUH] for surveys) o 

These three forms of reasoning, namely inferring causes from databases, 
consistency-based diagnosis, and consistent query answering (and repairs) are 
all non-monotonic [S5]. For example, a (most responsible) cause for a query 
result may not be such anymore after the database is updated. Furthermore, 
they all reflect some sort of uncertainty about the information at hand. In this 
work we establish natural, precise, useful, and deeper connections between 
these three reasoning tasks. 

More precisely, we unveil a strong connection between computing causes 
and their responsibilities for conjunctive query answers, on one hand, and com¬ 
puting repairs in databases with respect to denial constraints, on the other. 
These computational problems can be reduced to each other. In order to obtain 
repairs with respect to a set of denial constraints from causes, we investigate 
causes for queries that are unions of conjunctive queries, and develop algo¬ 
rithms to compute causes and responsibilities. 

We show that inferring and computing actual causes and their responsi¬ 
bilities in a database setting become diagnosis reasoning problems and tasks. 
Actually, a causality-based explanation for a conjunctive query answer can be 
viewed as a diagnosis, where in essence the first-order logical reconstruction of 
the relational database provides the system description [51], and the observa¬ 
tion is the query answer. We obtain causes and their responsibilities -and as 
a side result, also database repairs- from diagnosis. 

Being the causality problems the main focus of this work, we take advan¬ 
tage of algorithms and complexity results both for consistency-based diagnosis 
on one side; and database repairs and consistent query answering |5], on an¬ 
other. In this way, we obtain new complexity results for the main problems of 
causality, namely computing actual causes, determining their responsibilities, 
and obtaining most responsible causes; and also for their decision versions. In 
particular, we obtain fixed-parameter polynomial-time algorithms for some of 
them. More precisely, our main results are as follows: (the complexity results 
are all in data complexity) 

1. We characterize actual causes and most responsible actual causes for a 
boolean conjunctive query in terms of subset- and cardinality-repairs of the 
instance with respect to the denial constraint associated to the query (the 
query being the violation view of the constraint). In this way we can compute 
causes from repairs. 

In the other direction, we obtain repairs of databases with respect to sets 
of denial constraints from causes for query results. For this, we extend the 
treatment of causality to unions of conjunctive queries (to represent multiple 
denial constraints). We characterize an actual cause’s responsibility in terms of 
cardinality-repairs. Along the way we provide PTIME algorithms to compute 
causes and their (minimal) contingency sets for unions of conjunctive queries. 

^ Although not in the context of repairs, consistency-based diagnosis has been applied to 
consistency restoration of a database with respect to integrity constraints m- 
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2. We reduce causes for a boolean conjunctive query to consistency-based diag¬ 
nosis for the query being unexpectedly true according to a system description. 

In particular, we show how to compute actual causes, their contingency sets, 
and responsibilities using the diagnosis characterization. As a side result, we 
obtain database repairs from diagnosis. 

Hitting-set-based algorithmic approaches to diagnosis [S3] inspire our al¬ 
gorithmic/complexity approaches to causality. In particular, we reformulate 
the causality problems as hitting-set problems and vertex cover problems on 
hypergraphs, which allows us to apply results and techniques for the latter to 
causality. 

3. We obtain several new computational complexity results: 

(a) Checking minimal contingency sets can be done in PTIME. 

(b) The responsibility decision problem for conjunctive queries, which is about 
deciding if a tuple’s responsibility is greater that a bound v (that is 
part of the input) is AP-complete. However, this problem becomes fixed- 
parameter tractable, with the parameter being A 

(c) The problem of computing responsibilities of causes is FP^^*'*°®^"'^^-complete. 
Deciding most responsible causes is P^^(^°®*^"))-complete. 

(d) The structure of the resulting hitting-set problem allows us to obtain ef¬ 
ficient parameterized algorithms and good approximation algorithms for 
computing causes and minimal contingency sets. 

(e) From the repair connection we obtain that, for consistency based-diagnosis 
with specifications given by positive implications with disjunctive conse¬ 
quents, the problems of computing minimum-cardinality diagnoses and 
computing minimum-cardinality diagnoses that contain a given atom are 
both FF'^^*'*°®*'"'^^-hard in the size of their underlying Herbrand structure. 

4. We define notions of preferred causes; in particular one based on prioritized 
repairs |59] . We also propose an approach to causality based on interventions 
that are repair actions that replace attribute values by null values. 

The paper is structured as follows. Section[3]introduces technical preliminaries 
for relational databases, causality in databases, database repairs and consis¬ 
tent query answering, consistency-based diagnosis, and relevant complexity 
classes. Section |3| characterizes actual causes and responsibilities in terms of 
database repairs. Section |4| characterizes repairs and consistent query answers 
in terms of causes and contingency sets for queries that are unions of con¬ 
junctive queries, and presents an algorithm for computing both of the latter. 
Section [3] formulates causality and repair problems as consistency-based diag¬ 
nosis problems. Section [B] shows complexity and algorithmic results; in partic¬ 
ular a fixed-parameter tractability result for causes’ responsibilities, and also 
about consistency based-diagnosis. Section |7| deals with preferred causes. Sec¬ 
tion |S| discusses several relevant issues, connections and open problems around 
causality in databases. It also draws some final conclusions. We provide proofs 
for all the results except for those that are rather straightforward. This is an 
extended version of m- It contains proofs, many improvements in the pre- 
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sentation, and also new developments and results, mainly in Sections 16.21 and 

m 


2 Preliminaries 

We consider relational database schemas of the form S = [U,V), where U 
is the possibly infinite database domain of constants and P is a finite set of 
database predicate^ of hxed arities. A database instance D compatible with 
S can be seen as a finite set of ground atomic formulas (in databases aka. 
atoms or tuples), of the form P(ci,..., c„), where P € V has arity n, and 

Cl 5 • ■ - ; Cyj C U. 

A conjunctive query (CQ) is a formula Q{x) of the hrst-order (FO) logic 
language, £(5), associated to S of the form 3y(Pi(si) A ■ • ■ A Pm{sm)), where 
the Pi{si) are atomic formulas, i.e. Pi € V, and the Si are sequences of terms, 
i.e. variables or constants!! The x in Q{x) shows all the free variables in the 
formula, i.e. those not appearing in y. If x is non-empty, the query is open. 
If X is empty, the query is boolean (a BCQ), i.e. the query is a sentence, in 
which case, it is true or false in a database, denoted hy D \= Q and D Q, 
respectively. A sequence c of constants is an answer to an open query Q{x) 
HD \= Q[c], i.e. the query becomes true in D when the free variables are 
replaced by the corresponding constants in c. 

An integrity constraint is a sentence of language JC-{S), and then, may be 
true or false in an instance for schema S. Given a set IC of integrity constraints 
for schema S, a database instance D is consistent with. 5 if Z? |= IC] otherwise 
it is said to be inconsistent. In this work we assume that sets of integrity 
constraints are always finite and logically consistent. 

A particular class of integrity constraints is formed by denial constraints 
(DCs), which are sentences k of the form: Vs-'(Ai(si) A • • • A A„(s„), where 
s = U 3,nd each Ai{si) is a database atom, i.e. predicate Ai e V. So as with 
conjunctive queries, the atoms may contain constants. Denial constraints are 
exactly the negations of BCQs. Sometimes we use the common representation 
of DCs as “negative rules” of the form: •(— Ai(si),..., A„(s„). We will also 
consider functional dependencies (FDs) as DCs. They are represented by neg¬ 
ative rules of the form: t— A(xi,X 2 ,y), A(xi,x^, z),y yf z, saying that the 
last attribute of relation A functionally depends upon the attributes holding 
variables xi. They do not contain constants, and correspond to BCQs with 
inequality. 


® As opposed to built-in predicates (e.g. that we assume do not appear, unless explicitly 
stated otherwise. 

^ In this work, we will assume, unless otherwise explicitly said, that CQs may contain 
inequality atoms (equality atoms are not an issue, because they can always be eliminated). 
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2.1 Causality and responsibility 

Assume that the database instance is split in two, i.e. D = H" U , where 

and denote the disjoint sets of endogenous and exogenous tuples, re¬ 
spectively. 

Actual causes and contingent tuples are usually restricted to be among 
a pre-specified set of endogenous tuples, which are admissible, possible can¬ 
didates for causes, as opposed to the exogenous tuples. Actually, the latter 
provide the context or the background for the problem, and are considered as 
external factors that are not of interest to the current problem statement or 
beyond our control. Since no intervention (or update, in database parlance) is 
conceivable on exogenous tuples, they can not be included in any contingency 
set or be an actual cause. They are assumed to be included in all conceivable 
hypothetical states of a database. 

The endogenous/exogenous partition is application-dependent and cap¬ 
tures predetermined factors, such as users preferences that may affect QA- 
causal analysis. For example, certain tuples or full tables might be identified 
as irrelevant (or exogenous) in relation to a particular query at hand, or de¬ 
cided to be exogenous or endogenous a priori, independently from the query. 

A tuple t G D" is called a counterfactual cause for a BCQ Q, if ^ Q and 
D \ {t} ^ Q. A tuple t G Z?” is an actual cause for Q if there exists F C £)”, 
called a contingency set, such that t is a counterfactual cause for Q in D \ F 

m- 

We will concentrate mostly on CQs. However, the definitions of actual cause 
and contingency set can be applied without a change to monotone gueries in 
general 113 , in particular to unions of BCQs (UBCQs), with or without built- 
ins. 

The responsibility of an actual cause t for Q, denoted by py,(t), is the 
numerical value where |T| is the size of the smallest contingency set 

for t. We can extend responsibility to all the other tuples in by setting 
their value to 0 . Those tuples are not actual causes for Q. 

Example 1 Consider D = = {i?(a4, 03), i?(a2, oi), i?(a3, 03), 5(04), 5(02), 

S{a3)}, and the query Q : 3 x 3 y{S{x) A R{x,y) A S{y)). It holds: D \= Q. 

Tuple 8(03) is a counterfactual cause for Q. If 8(03) is removed from 
D, Q is not true anymore. Therefore, the responsibility of 5'(a3) is I. Besides, 
R{a4,03) is an actual cause for Q with contingency set {i?(a3, 03)}. If R{a3, 03) 
is removed from D, Q is still true, but further removing R(a4, 03) makes Q 
false. The responsibility of R{a4,a3) is because its smallest contingency 
sets have size 1 . Likewise, R^asjOs) and 5'(a4) are actual causes for Q with 
responsibility 

For the same Q, but with D = (5(03), ^'(04), R{a4, 03)}, and the partition 

= {5'(a4), 5'(a3)} and = {R{a4,a3)}, it turns out that both ^'(03) and 
^'(04) are counterfactual causes for Q. □ 

Remark 1 In the rest of this paper, we will assume in the context of causality 
that database instances D are partitioned as D = U D^, into a subset 
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of endogenous and a set of exogenous tuples, respectively. We will denote 
with Causes {D, Q) the set of actual causes for the BCQ Q (being true) from 
instance D. 


2.2 Database repairs 

Given a set IC of integrity constraints, a subset repair (simply, S-repair) of 
a possibly inconsistent instance D for schema S is an instance D' for S that 
satisfies IC and makes A{D,D') = (Z? \ D') U {D' \ D) minimal under set 
inclusion 0 Srep{D, IC) denotes the set of S-repairs of D with respect to IC 
[ 5 ]. Similarly, D' is a cardinality repair (simply C-repair) of D if D' satisfies 
IC and minimizes \A{D,D')\. Crep{D, IC) denotes the class of C-repairs of 
D with respect to IC. C-repairs are always S-repairs. For DCs, S-repairs and 
C-repairs are obtained from the original instance by deleting an S-minimal, 
resp. C-minimal, set of tuples. In other words, S- and C-repairs under DCs 
become maximal (under set inclusion), resp. maximum (in cardinality), con¬ 
sistent subsets of the given instance. 

In more general terms, we say that a set is S-minimal in a class of sets C if it 
is minimal under set inclusion in C. Similarly, a set is C-minimal (or minimum) 
if it is minimal in cardinality within C. S-minimality and C-minimality are 
defined similarly. 

Example 2 (ex. [IJcont.) Consider the denial constraint «::■<— S(x), R(x,y), S{y), 
whose body corresponds to the CQ in Example [U and is violated by the given 
instance D. 

Here, Srep{D, k) = {Di, D2, D3} with Di = {i?(a4,03), i?(a2, oi), i?(a3,03), 
5'(a4), 5'(a2)}, £>2 = {i?(a2,ai), S{a4), £'( 02 ), ^'(as)}, D 3 = {R{a 4 ,a 3 ), 

£(02,01), £(02), £(03)}. The only C-repair is Di, i.e. Crep{D,K) = {Di}. □ 

More generally, different repair semantics may be considered to restore con¬ 
sistency with respect to general integrity constraints. They depend on the kind 
of allowed updates on the database (i.e. tuple insertions/deletions, changes 
of attribute values), and the minimality conditions on repairs, e.g. subset- 
minimality, 

cardinality-minimality, etc. 

c 

Given D and IC, a repair semantics, S, defines a class Rep {D, IC) of 

S-repairs, which are the intended repairs [SJ Sec. 2 . 5 ]. All the elements of 

c 

Rep {D, IC) are instances over the same schema of D, and consistent with 

c 

respect to IC. If D is already consistent, Rep {D, IC) contains D as its only 
member. 

Given a repair semantics, S, c is a S-consistent answer to an open query 
Q{x) if D' ^ Q[c\ for every D' e Rej^{D, IC). A BCQ is S-consistently true if 


^ In general, in the context of repairs, partitions on instances are not considered. However, 
in Section l7.3l we will bring them into the repair scene. 
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it is true in every D' € Rep {D^ IC). In particular, if c is a consistent answer to 
Q{x) with respect to S-repairs, we say it is an S-consistent answer. Similarly 
for C-consistent answers. Consistent query answering for DCs under S-repairs 
was investigated in detail [18] . C-repairs and consistent query answering under 
them were investigated in detail in jUj. (Cf. [3] for more references.) 


2.3 Consistency-based diagnosis 

Consistency-based diagnosis, a form of model-based diagnosis [501 Sec. 10.4], 
considers problems M = {SD, COMPS, OBS), where SD is the description in 
logic of the intended properties of a system under the explicit assumption that 
all the components in COMPS are working normally. OBS is a FO sentence 
that represents the observations. If the system does not behave as expected (as 
shown by the observations), then the logical theory obtained from SD U OBS 
plus the explicit assumption, say /\c^comps that the components are 

indeed behaving normally, becomes inconsistent. j 46 is an abnormality predi- 
cateH 

The inconsistency is captured via the minimal conflict sets, i.e. those mini¬ 
mal subsets COMPS' of COMPS, such that SDUOBSU{/\^^QQj^^pg, -^Ah{c)} 
is inconsistent. As expected, different notions of minimality can be used at this 
point. 

A minimal diagnosis for A4 is a minimal subset A of COMPS, such that 
SDUOBSU{-<Ab{c) \ c G COMPS\A}U{Ab{c) | c G Z\} is consistent. That is, 
consistency is restored by flipping the normality assumption to abnormality 
for a minimal set of components, and those are the ones considered to be 
(jointly) faulty. The notion of minimality commonly used is S-minimality, i.e. 
a diagnosis that does not have a proper subset that is a diagnosis. We will 
use this kind of minimality in relation to diagnosis. Diagnosis can be obtained 
from conflict sets [SSI- 

Example 3 Consider a simple logical gate Or, denoted with o (the only system 
component in this case), that receives two digits, x,y, as inputs and outputs 
a digit val(x, y). 

This simple system can be specified in terms of normal behavior by the 
logical formula cr: - 1 ^ 6 ( 0 ) —> {val{x,y) = 0 •(—> x = y = 0 )), saying 

that, when the gate is not abnormal, the output is 0 iff the inputs are both 0 . 

The logical theory {a, ^ 0 /( 0 ,1) = 0} is logically consistent (it can be made 
true) despite the unexpected observation (namely, output 0 with inputs 0 , 1 ). 
This is because the system’s model allows for abnormal behaviors. However, 
this theory together with the extra assumption ^Ab{o), i.e. that the gate is 
normal, form the theory {a, val{0, 1 ) = 0, —>Ab(o)} that is inconsistent in the 
sense that it can not be made true (in technical terms, it has not models). □ 

® Here, and as usual, the atom Ab(c) expresses that component c is (behaving) abnor- 
mal(ly). 
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2.4 Complexity classes 

We recall some complexity classes [ 55 ] used in this paper. FP is the class of 
functional problems associated to decision problem in the class PTIME^ i.e. 
that are solvable in polynomial time. (or Z\|’) is the class of decision 

problems solvable in polynomial time by a machine that makes calls to an NP 
oracle. For the number of calls is logarithmic. It is not known if 

pNP(iog(n)) jg strictly contained in P^K is similarly defined. 


3 Actual Causes From Database Repairs 

In this section we characterize actual causes for a BCQ Q being true in a 
database instance D in terms of the repairs of D with respect to a denial 
constraint whose violation view is Q, i.e. the latter asks if the constraint is 
violated. In essence, the actual causes will become the tuples outside an S- 
repair. The complement of the latter contains the cause plus a contingency set 
for the cause. In order to capture responsibility, C-repairs are considered. 

Let D be an instance for schema <S, and Q: 3 x{Pi{xi) A • • ■ A Pm{xm)) a 
BCQ. Q may be unexpectedly true, i.e. D \= Q. Now, -iQ is logically equivalent 
to the DC k{Q) : Vx-'(Pi(a;i) A ■ • ■ APmixm))- The requirement that -iQ holds 
can be captured by imposing k{Q) on D. Due to D |= Q, it holds D ^ k(Q). 
So, D is inconsistent with respect to k{Q), and could be repaired. 

Repairs for (violations of) DCs are obtained by tuple deletions. Intuitively, 
a tuple that participates in a violation of k{Q) in D is an actual cause for Q. 
S-minimal sets of tuples like this are expected to correspond to S-repairs for 
D with respect to k{Q). 

More precisely, given an instance D, a BCQ Q, and a tuple t G D”, we 
consider: 

— The class containing the sets of differences between D and those S-repairs 
that do not contain t, and are obtained by removing a subset of D": 

= {D~^D' \D'g Srep{D,K{Q)), 

t€{D\D')CD^}. (1) 

— The class containing the sets of differences between D and those C-repairs 
that do not contain t, and are obtained by removing a subset of D": 

Diff’^[D,K{Q),t) = {D\D' I D' € Crep{D,K{Q)), 

t G (D \ D') C D”}. (2) 

It holds C Diff‘‘{D,K{Q),t). 

Now, any A G Diff^{D,K{Q),t) can be written as A = A' A {t}. From 
the S-minimality of S-repairs, it follows that D \ {A' U {t}) \= k{Q), but 
D \ yl' ^ -<k{Q). That is, D \ (kl' U {t}) ^ Q, but D \ A' \= Q. As a 
consequence, t is an actual cause for Q with contingency set yl'. We have 
obtained the following result. 
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Proposition 1 Given an instance D and a BCQ Q,t€ Z?” is an actual cause 
for Q iff Diff^{D, K{Q),t) ^ 0 . Furthermore, ii D \ D' G K{Q),t), 

then D \ {D' U {t}) is a minimal contingency set for t. □ 

Proposition 2 Given an instance D, a BCQ Q, and t G ZZ": 

(a) If = 0 , then p^{t) = 0 . 

(b) Otherwise, where A G Diff‘^{D,K{Q),t) and there is no A' G 

Diff^{D,K{Q),t) such that \A'\ < m. □ 

Corollary 1 Given an instance D and a BCQ Q: t G ZZ" is a most responsible 
actual cause for Q iff K{Q),t) 7^ 0 . □ 

Example 4 (ex.[T]and[ 2 ]cont.) Consider the same instance D and query Q. The 
associated DC is k(Q) : ^ S{x), R{x, y),S{y) that we considered in Example 
[21 where we obtained Srep{D, k{Q)) = {Di, ZZ2, ZZ3} and Crep{D, k{Q)) = 
{Di}. 

For tuple Z?(a4, 03), Diff‘'{D, K{Q),R{a4, as)) = {D \ ZZ2} = {{Z?(a4,03), 
-R(«3 which, by Propositions |T] and (H confirms that R(a4,as) is an 

actual cause, with responsibility The complement of ZZ \ ZZ2 contains the 
actual cause R{as, as) plus a contingency set of it, namely that formed by tuple 
R{cis, 03), which has to be deleted together with the actual cause Z?(a4, 03) to 
restore consistency (cf. Example | 2 ])- 

For tuple ^(03), Diff^ {D,k{Q), S{as)) = {ZZ \ ZZi} = {^(03)}. So, ^(03) 
is an actual cause with responsibility 1. 

Similarly, R{as, as) is an actual cause with responsibility i, because Diff^(D, 
k(Q), Z?(a3,03)) = {D\D2, ZZ \ ZZ3} = {{Z?(o4, as), Z?(a3, 03)}, {Z?(a3, as), 
S{a4)}}. 

It holds Diff‘‘{D,K{Q),S{a2)) = Diff‘‘{D,K{Q,),R{a2,ai)) = 0 , because 
all repairs contain £'(02), R{a2, ai). This means they do not participate in the 
violation of k{Q) or contribute to make Q true. So, they are not actual causes 
for Q, confirming the result in Example |TJ 

k{Q), S{as)) = {S'(a3)}. From Corollary[Tl S{as) is the most re¬ 
sponsible cause. □ 

Remark 2 The results in this section can be easily extended to unions of BCQs. 
This can be done by associating a DC to each disjunct of the query, and 
considering the corresponding problems for database repairs with respect to 
several DCs (cf. Section HTT|) . □ 


4 Database Repairs Prom Actual Causes 

In this section we characterize repairs for inconsistent databases with respect 
to a set of DCs in terms of actual causes with their contingency sets. The 
reduction of repair-related computations to cause-related computations is par¬ 
ticularly relevant, because we can take advantage of known complexity results 
for repairs to obtain new lower-bound complexity results for causality. 
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Causality has been investigated so far mainly for single conjunctive queries. 
However, database repairs appear in the context of sets of constraints. We 
concentrate on sets of DCs, which requires extending the analysis of causality 
to unions of conjunctive queries. 

More concretely, in this section we characterize repairs of a database in¬ 
stance D with respect to a set S of DCs in terms of the actual causes (with 
their contingency sets) for the union of the conjunctive queries naturally as¬ 
sociated to the (bodies of the) DCs. In essence, an S-repair D' is a maximal 
subset of D that does not contain any actual cause t, and the tuples other 
than t and outside D' form a contingency set for t. As expected, C-repairs 
require the use of most responsible tuples. 

Consider an instance D for schema <S, and a set of DCs S on S. For each 
K G S, say k: ■(— Ai(a;i),..., A„(a;„), consider its associated violation view 
dehned by a BCQ, namely V^: 3a;(Ai(a;i) A • • • A A„(a;„)). The answer yes to 

shows that k is violated (i.e. not satisfied) by D. 

Next, consider the query that is the union of the individual violation views: 

:= \l 9' union o/BCQs (UBCQs). Clearly, D violates (is inconsis¬ 
tent with respect to) A iff D |= V^. 

It is easy to verify that D, with = 0, is consistent with respect to S 
iff Causes{D, V^) = 0, i.e. there are no actual causes for when all tuples 
are endogenous. 

Now, let us collect all S-minimal contingency sets associated with an actual 
cause t for V^: 

Definition 1 For an instance D and a set S of DCs: 

Cont{D,V^,t) :={rCD^ \ D\r\=V^, D \ {r U {t}) (3) 

and vr' Cr, B\(r'u {t}) h □ 

Notice that for F G Cont{D,V^,f), it holds t ^ F. When = 0, if 
t G Causes{D,V^) and F G Cont{D,V^,t), from the definition of actual 
cause and the S-minimality of F, it holds that F" = F U {t} is an S-minimal 
subset of D with D \ F" ^ . So, D \ F" is an S-repair for D. Then, the 

following holds. 

Proposition 3 For an instance D, with = 0, and a set DCs E: D' C D is 
an S-repair for D with respect to E iff, for every t G D\ D': t G Causes{D, V^) 
and D \ {D' U {t}) G Cont{D, , t). □ 

To establish a connection between most responsible actual causes and C- 
repairs, assume that = 0, and collect the most responsible actual causes 
for V^: 

Definition 2 For an instance D with = 0: 

MRC{D,V^) :={tGD \ tG Causes{D,V^), fit' G Causes{D,V^) 

with p^{t') > p^{t)}. 


(4) 

□ 
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Proposition 4 For instance D, with = 0, and set of DCs S-. D' C D is a 
C-repair for D with respect to E iff, for every t € D \ D': t € MRC{D^ V^) 
an<l D \ {D'yj {t}) & Cont{D,V^,t). □ 

Actual causes for with their contingency sets, account for the violation 
of some K G E. Removing those tuples from D should remove the inconsistency. 
From Propositions [3] and S] we obtain: 

Corollary 2 Given an instance D and a set DCs E, the instance obtained 
from D by removing an actual cause, resp. a most responsible actual cause, 
for together with any of its S-minimal, resp. C-minimal, contingency sets 
forms an S-repair, resp. a C-repair, for D with respect to E. □ 

Example 5 Consider D = {P{a),P{e),Q{a,b),R{a,c)} and E = {ki,K 2 \, 
with Ki : t— P{x), Q{x, y) and K 2 '■ t— P(x), R{x, y). 

The violation views are : 3xy{P{x) A Q{x,y)) and : 3xy{P{x) A 
R{x, y)). For := V D \= and D is inconsistent with respect to 

E. 

Now assume all tuples are endogenous. It holds Causes{D,V^) = {P(a), 
(5(a, 6), i?(a, c)}, and its elements are associated with sets of S-minimal con¬ 
tingency sets, as follows: Cont{D,V^,Q{a,b)) = {{i?(a,c)}}, Cont{D,V^, 
R{a,c)) = {{(5(a, 6)}}, and Cont{D,V^, P{a)) = {0}. 

From Corollary [21 and Cont{D,V^, R{a,c)) = {{(5(a, 6)}}, Di = D \ 
{{R{a, c)}U{(5(a, b)}) = {P(a), P(e)} is an S-repair. So is D 2 = P\ ({P(a)}U 
0) = {P{e),Q{a,b), R{a,c)}. These are the only S-repairs. 

Furthermore, MRC{D,V^) = {P(a)}. From Corollary |21 P 2 is also a C- 
repair for D. □ 

Remark 3 An actual cause t with any of its S-minimal contingency sets deter¬ 
mines a unique S-repair. The last example shows that, with different combina¬ 
tions of a cause and one of its contingency sets, we may obtain the same repair 
(e.g. for the first two Cont sets). So, we may have more minimal contingency 
sets than minimal repairs. However, we may still have exponentially many 
minimal contingency sets, so as we may have exponentially many minimal re¬ 
pairs of an instance with respect to DCs, as the following example shows0 
□ 

Example 6 Consider D = {P(l, 0), P(l, 1),..., R{n, 0), P(n, 1), S'(l), S'(O)} and 
the DC k: ■(— R{x,y), R{x, z), S{y), S{z). D is inconsistent with respect to k. 
There are exponentially many S-repairs of P: D' = D \ {S'(O)}, D" = D \ 
{5(1)}, Pi = P \ {P(l, 0),..., P(n, 0)1, ..., P 2 . = P \ {P(l, 1),..., P(n, 1)}. 
The C-repairs are only P' and D". 

For the BCQ associated to k, P |= H”, and 5(1) and 5(0) are actual 
causes for (courterfactual causes with responsibility 1). All tuples in R 
are actual causes, each with exponentially many S-minimal contingency sets. 

^ Cf. [2] for an example of the latter that uses key constraints, which are DCs with 
inequalities (with violation views that contain inequality). 
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For example, i?(l, 0) has the S-minimal contingency set {i?(2,0),..., R{n, 0)}, 
among exponentially many others (any set built with just one element from 
each of the pairs {R(2, 0), R{2, 1)}, {R{n, 0), i?(n, 1)} is one). □ 


4.1 Causes for unions of conjunctive queries 

If we want to compute repairs with respect to sets of DCs from causes for 
UBCQs using, say Corollary we first need an algorithm for computing the 
actual causes and their (minimal) contingency sets for UBCQs. These algo¬ 
rithms could be used as a first stage of the computation of S-repairs and 
C-repairs with respect to sets of DCs. However, these algorithms (developed 
in Section are also interesting and useful per se. 

The PTIME algorithm for computing actual causes in [47] is for single 
conjunctive queries, but does not compute the actual causes’ contingency sets. 
Actually, doing the latter increases the complexity, because deciding responsi¬ 
bility of actual causes is AP-hard [JT] (which would be tractable if we could 
efficiently compute all (minimal) contingency sets) 1^ In principle, an algorithm 
for responsibilities can be used to compute C-minimal contingency sets, by iter¬ 
ating over all candidates, but Example |6] shows that there can be exponentially 
many of them. 

We first concentrate on the problem of computing actual causes for UBCQs, 
without their contingency sets, which requires some notation. 

Definition 3 Given Q = Ci V• • • VCfe, where each Ci a BCQ, and an instance 
D: 

(a) &{D) is the collection of all S-minimal subsets of D that satisfy a disjunct 
a of Q. 

(b) ©"(D) consists of the S-minimal subsets A of D" for which there exists a 

A' e 6(D) with A C A' and A \ A' C □ 

©"■(D) contains all S-minimal sets of endogenous tuples that simultane¬ 
ously (and possibly accompanied by exogenous tuples) make the query true. 
It is easy to see that ©(D) and ©"(D) can be computed in polynomial time 
in the size of D. 

Now, generalizing a result for CQs in [47], actual causes for a UBCQs 
can be computed in PTIME in the size of D without computing contingency 
sets. We formulate this results in terms of the corresponding causality decision 
problem (GDP). 

Proposition 5 Given an instance D, a UBCQ Q, and t G D": 

(a) t is an actual cause for Q iff there is A S ©"(D) with t G A. 

® For a precise formulation, see Definition [5] 

® Actually, 03 presents a PTIME algorithm for computing responsibilities for a restricted 
class of CQs. 




14 


Leopoldo Bertossi, Babak Salimi 


(b) The causality decision problem (about membership of) 

CW := {{D,t) I t € D^, and t £ Causes{D, Q)} (5) 

belongs to PTIME. 

Proof (a) Assume &{D) = {Ai,... ,Am}, and there exists a A £ &^{D) with 
t £ A. Consider a set P C Z?" such that, for all Ai £ &^{D) where Ai ^ A, 
P n Ai ^ 0 and P D A = 0. With such a P, t is an actual cause for Q with 
contingency set P. So, it is good enough to prove that such P always exists. In 
fact, since all subsets of ©"(U) are S-minimal, then, for each Ai £ ©”(£>) with 
Ai ^ A, Ai A = 0. Therefore, P can be obtained from the set of difference 
between each Ai and A. 

Now, if t is an actual cause for Q, then there exist an S-minimal P £ 
such that IZ\ (rU{t}) ^ Q, but E\P 1= Q- This implies that there exists an 
S-minimal subset A of D, such that t £ A and A\= Q. Due to the S-minimality 
of P, it is easy to see that t is included in a subset of ©"(D). 

(b) This is a simple generalization of the proof of the same result for single 
conjunctive queries found in m- □ 


Example 7 (ex. 0 cont.) Consider the query Q : 3xy{P{x) A Q{x,y)) V 
3xy{P{x) A R{x,y)), and assume that for D, D" = {P(a), i?(a, c)} and 
= {P{e),Q{a,b)}. It holds ©(D) = {{P(a), (5(a, 6)}, {P(a), i?(a,c)}}. 
Since {P(a)} C {P{a), R{a,c)}, ©”(D) = {{P(a)}}. So, P{a) is the only 
actual cause for Q. □ 


4.2 Contingency sets for unions of conjunctive queries 

It is possible to develop a (naive) algorithm that accepts as input an in¬ 
stance D and a UBCQ Q, and returns Causes{D, Q); and also, for each 
t £ Causes{D, Q), its (set of) S-minimal contingency sets Cont{D, Q,t). 

The basis for the algorithm is a correspondence between the actual causes 
for Q with their contingency sets and a hitting-set prohlern^^ More precisely, 
for a fixed UBCQ Q, consider the hitting-set framework 

^'^(D) = (D",©"(D)), (6) 

with ©"(D) as in Definition |3l Different computational and decision problems 
are based on and we will confront some below. Notice that hitting-sets 

(HSs) are all subsets of D". 

The S-minimal hitting-sets for Sf^{D) correspond to actual causes with 
their S-minimal contingencies for Q. Most responsible causes for Q are in 
correspondence with hitting-sets for This is formalized as follows: 


If C is a collection of non-empty subsets of a set S, a subset S' C S' is a hitting-set for 
C if, for every C G C, C n S' yf 0. S' is an S-minimal hitting-set if no proper subset of it is 
also a hitting-set. S is a minimum hitting-set if it has minimum cardinality. 
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Proposition 6 For an instance D, a UBCQ Q, and t G D": 

(a) t is an actual cause for Q with S-minimal contingency set F iff F U {t} is 
an S-minimal hitting-set for 

(b) t is a most responsible actual cause for Q with C-minimal contingency set 

F iff F U {f} is a minimum hitting-set for □ 

The proof is similar to that of part (a) of Proposition 0 

Example 8 (ex.[S]and[3cont.) D and Q are as before, but now all tuples are en¬ 
dogenous. Here, ©(F) = ©"(F) = {{F(a), (5(a, 6)}, {F(a), R{a,c)}}. 
has two S-minimal hitting-sets: Hi = {F(a)} and H 2 = {Q{a,b),R{a,c)}. 
Each of them implicitly contains an actual cause (any of its elements) with 
an S-minimal contingency set (what’s left after removing the actual cause). 
Hi is also the C-minimal hitting-set, and contains the most responsible actual 
cause, F(a). □ 

Remark 4 For = (F", ©"(F)), G^{D) can be computed in PTIME in 

data complexity, and its elements are bounded in size by |Q|, which is the 
maximum number of atoms in one of Q’s disjuncts. This is a special kind of 
hitting-set problems. For example, deciding if there is a hitting-set of size at 
most k as been called the d-hitting-set problem ES], and d is the bound on the 
size of the sets in the set class. In our case, d would be |Q|. □ 


4.3 Causality, repairs, and consistent answers 

Corollary [2] and Proposition [B] can be used to compute repairs. If the classes 
of S- and C-minimal hitting-sets for io"'(F) (with F" = F) are available, 
computing S- and C-repairs will be in PTIME in the sizes of those classes. 
However, it is well known that computing minimal hitting-sets is a complex 
problem. Actually, as Example [B] implicitly shows, we can have exponentially 
many of them in |F|; so as exponentially many minimal repairs for F with 
respect to a denial constraint. We can see that the complexity of contingency 
sets computation is in line with the complexities of computing hitting-sets and 
repairs. 

As Corollary [5] and Proposition [B] show, the computation of causes, con¬ 
tingency sets, and most responsible causes via minimal/minimum hitting-set 
computation can be used to compute repairs and decide about repair ques¬ 
tions. Since the hitting-set problems in our case are of the d-hitting-set kind, 
good algorithms and approximations for the latter (cf. Section 16.111 could be 
used in the context of repairs. 

In the rest of this section we consider an instance F whose tuples are all 
endogenous, and a set E of DCs. For the disjunctive violation view , the 
following result is obtained from Propositions [3] and 01 and Corollary 01 

Corollary 3 For an instance F, with = 0, and a set E of DCs, it holds: 

(a) For every t G Causes{D, V^), there is an S-repair that does not contain t. 
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(b) For every t S MRC{D, F^), there is a C-repair that does not contain t. 

(c) For every D' G Srep{D,S) and D" G Crep{D, S), it holds D \ D' Q 

Causes{D, V^) and D \ D" Q MRC{D, V^). □ 

For a projection-free, and a possibly non-boolean CQ Q, we are interested 
in its consistent answers from D with respect to S. For example, for Q{x, y, z): 
R{x, y) AS{y, z), the S-consistent (C-consistent) answers would be of the form 
{a,b,c), where R{a,b) and S{b,c) belong to all S-repairs (C-repairs) of D. 

From Corollary|3l {a^b,c) is an S-consistent (resp. C-consistent) answer iff 
R{a,b) and S{b,c) belong to D, but they are not actual causes (resp. most 
responsible actual causes) for . 

The following simple result and its corollary will be useful in Section [51 

Proposition 7 For an instance D, with = 0, a set E of DCs, and a 
projection-free CQ Q{x): Pi{xi) A ■ ■ ■ A Pk{xk)- 

(a) c is an S-consistent answer iff, for each i, Pi{ci) G (D \ Causes{D, V^)). 

(b) c is a C-consistent answer iff, for each i, Pi{ci) G (D \ MRC{D, V^)). □ 

Example 9 (ex. cont.) Consider Q{x) : P{x). We had Causes{D,V^) = 
{P(a), Q{a,b), R{a,c)}, MRC{D,V^) = {P(a)}. Then, (e) is both an S- and 
a C-consistent answer. □ 

Notice that PropositionHcan easily be extended to conjunctions of ground 
atomic queries. 

Corollary 4 Given an instance D and a set E of DCs, the ground atomic 
query Q: P{c) is C-consistently true iff P{c) G D and it is not a most respon¬ 
sible cause for . □ 

Example 10 For D = {P(a, b), R{h, c),R{a, d)} and the DC k : •<— P{x, y), R{y, z), 
we obtain: Causes{D, V^) = MRC{D,V‘^) = {P{a,b),R{b,c)}. 

From Proposition [71 the ground atomic query Q: R{a,d) is both S- and 
C-consistently true in D with respect to k, because, D \ Causes{D,V'^) = 
D\MRC{D,V^)={R{a,d)}. □ 

The CQs considered in Proposition [7] and its Corollary 0] are not particu¬ 
larly interesting per se, but we will use those results to obtain new complexity 
results for causality later on, e.g. Theorem |3l 


5 Causes and Repairs from Consistency-Based Diagnosis 

The main objective in this section is to characterize database causality com¬ 
putation as a diagnosis problemlB This is interesting per se, and will also 

The other direction is beyond the scope of this work. More importantly, logic-based 
diagnosis in general is a much richer scenario than that of database causality. In the former, 
we can have arbitrary logical specification, whereas under data causality, we have only 
monotone queries at hand. 
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allow us to apply ideas and techniques from model-based diagnosis to causal¬ 
ity. As a side result we obtain a characterization of database repairs in terms 
of diagnosis. 

Let D be an instance for schema 5, and Q: 3x{Pi{xi) A • • ■ A Pm{xm)), a 
BCQ. Assume Q is, possibly unexpectedly, true in D. So, for the associated 
DC k{Q) : 'ix->{Pi{xi) A A Pm{xm)), D ^ n{Q)- Q is our observation, 
for which we want to find explanations, using a consistency-based diagnosis 
approach. 

For each predicate P G V, we introduce predicate Abp, with the same 
arity as P. Intuitively, a tuple in its extension is abnormal for P. The “sys¬ 
tem description”, SD, includes, among other elements, the original database, 
expressed in logical terms, and the DC as true “under normal conditions”. 

More precisely, we consider the following diagnosis problem, A4 = {SD, D", 
Q), associated to Q. The FO system description, SD, contains the following 
elements: 

(a) Th{D), which is Reiter’s logical reconstruction of D as a FO theory 
(cf. Example ITT]) . 

(b) Sentence K{Qy^^, which is k{Q) rewritten as follows: 

Vx^{Pi{xi) A-'Abp^{xi) A ■ ■ ■ A Pm{xm) S-'Abp^ixm))- (7) 

(c) Formula © can be refined by applying the abnormality predicate, A&, to 
endogenous tuples only. For this we need to use additional auxiliary pred¬ 
icates Endp, with the same arity oi P G S, which contain the endogenous 
tuples in P's extension (see Example [TT]) . Accordingly, we introduce the 
inclusion dependencies: Eor each P gV, 

yx{Abp{x) —)• Endp{x)), and Vx{Endp{x) -A P{x))- 

The last entry, Q, in A4 is the “observation”, which together with SD 
will produce and inconsistent theory, because we make the initial and explicit 
assumption that all the abnormality predicates are empty (equivalently, that 
all tuples are normal), i.e. we consider, for each predicate P, the sentencj^ 

\/x{Abp{x) -A false), (8) 

where, false is a propositional atom that is always false. 

The second entry in A4 is D". This is the set of “components” that we 
can use to try to restore consistency, in this case, by (minimally) changing the 
abnormality condition on tuples in D". In other words, the universal rules © 
are subject to exceptions or qualifications: some endogenous tuples may be 
abnormal. Each diagnosis shows an S-minimal set of endogenous tuples that 
are abnormal. 


Notice that these can also be seen as DCs, since they can be written as \tx—'Ahp(x). 
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Example 11 (ex.[T]cont.) Consider the query Q : 3a;z]?/(S'(a;) Ai?(a;, y) A S'(y)), 
and the instance D = {^'(aa), S{a4), i?(a4,a3)}, with I?" = {^'(aa), ^'(aa)}, 
consider the diagnostic problem M. = {SD,{S{a4),S{a3)}, Q), with SD con¬ 
taining the sentences in (a)-(c) below: 

(a) Predicate completion axioms plus unique names assumption: 

\/xy{R{x, y) X = 04 A y = aa), \/x{S{x) <rA x = as W x = 04), ( 9 ) 
'ixy{Endii{x, y) •H’ false), \lx{Ends{x) -fA a: = aa V x = 04), (10) 

04^03. (11) 

(b) The denial constraint qualified by non-abnormality, 

Vxy-'(5'(x) A ^Ahs(x) A R{x, y) A -^ABr^x, y) A S{y) A -^Abs{y)). 

In diagnosis formalizations this formula would be usually presented as: 

yxyi i^Absix) A^AbR{x,y) A^Absiy)) —> ^{S{x) A R{x,y) A S{y))). 

That is, under the normality assumption, the “system” behaves as in¬ 
tended; in this case, there are no violations of the denial constraint. This 
main formula in the diagnosis specification can also be written as a dis¬ 
junctive positive rule: 

yxy{S{x) A R{x,y) A S{y) —^ Absix) V AbR{x,y) W Absiy)). (12) 

(c) Abnormality/endogenousity predicates are in correspondence to the database 
schema, and only endogenous tuples can be abnormal: 

Vxy(A6fi(x,j/) EndR{x,y)), 'ixy{EndR{x,y) -A R{x,y)), (13) 
'ix{Ahs{x) -A Ends{x)), \/x{Ends{x) -A S{x)). (14) 


In addition to this specification, we have the observation Q: 

3x3y{S{x) A R{x,y) A S{y)). (15) 

Finally, we make the assumption that there are not abnormal tuples: 

Vx?/(A 6 fl(x, y)—>■ false), Vx(A 65 (x)—> false). (16) 

The FO theory formed by (0) - dH]) ( more precisely, (0, (dB), (O, (O and 
(fTBll l is inconsistent. □ 

Now, in more general terms, the observation is Q (being true), obtained 
by evaluating query Q on (theory of) D. In this case, D ^ «:(Q). Since all the 
abnormality predicates are assumed to be empty, k(Q) is equivalent to k{Q)^^, 
which also becomes false with respect to D. As a consequence, 5'I>U{(I5])}U{Q} 
is an inconsistent FO theory. A diagnosis is a set of endogenous tuples that, 
by becoming abnormal, restore consistency. 
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Definition 4 (a) A diagnosis for Ai is a Z\ C £)”, such that 

SD U {Abp{c) I P{c) e U {^Abp{c) I P(c) £ D \ A} U {Q} 
is consistent. 

(b) Diag^{A4,t) denotes the set of S-minimal diagnoses for At that contain 
tuple t G D^. 

(c) Pm5^(At, t) denotes the set of C-minimal diagnoses in Pm(7®(A4, t)- D 

Example 12 (ex. [TT] cont.) The theory can be made consistent by giving up 
m, and making S-minimal sets of tuples abnormal. According to (HTni - (fTHl . 
those tuples have to be endogenous. 

Af has two S-minimal diagnosis: Ai = {£'(03)} and A4 = {£(04)}. The 
first one corresponds to replacing the second formula in (ITBl) by Vx{Abs{x) A 
X ^ ^ false), obtaining now a consistent theory. 

Here, Diag^{A 4 , S( 03 )) = Diag‘^{Ai, 8(03)) = {{£(a 3 )}}, and Diag^{Ai, 
£(04)) = Diag^iM, 8(04)) = {{ £(04)}}. 

If £(04,03) is also endogenous, then also {£(04,03)} becomes a minimal 
diagnosis. □ 

By definition, Diag‘^{M,t) C Diag‘^{M,t). Diagnoses for Af and actual 
causes for Q are related. 

Proposition 8 Consider an instance D, a BCQ Q, and the diagnosis problem 
Af associated to Q. Tuple t G D" is an actual cause for Q iff Diag’^{M., t) ^ 0 . 
□ 


The responsibility of an actual cause t is determined by the cardinality of 
the diagnoses in Diag^{M.,t). 

Proposition 9 For an instance D, a BCQ Q, the associated diagnosis prob¬ 
lem Af, and a tuple t G D", it holds: 

(a) Po(f) = 0 iff Diag^{M,t) = 0. 

(b) Otherwise, where A G Diag°(A4,t). □ 

For the proofs of Propositions [5] and O it is easy to verify that the conflict 
sets of Af coincide with the sets in 6(D") (cf. Definition [S]) . The results are 
obtained from the characterization of minimal diagnosis as minimal hitting- 
sets of sets of conflict sets (cf. Section [5] and [S3]) and Proposition jS] 

Example 13 (ex. |T3] cont.) From Propositions [5] and [31 £( 03 ) and £( 04 ) are 
actual cases, with responsibility 1. If £( 04 , 03 ) is also endogenous, it also be¬ 
comes an actual cause with responsibility 1 . □ 

In consistency-based diagnosis, minimal diagnoses can be obtained as S- 
minimal hitting-sets of the collection of S-minimal conflict sets (cf. Section |3|) 
[53] . In our case, conflict sets are S-minimal sets of endogenous tuples that, 
if not abnormal (only endogenous ones can be abnormal), and together, and 
possibly in combination with exogenous tuples, make © false. 
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It is easy to verify that the conflict sets of A4 coincide with the sets in 
©(!?") (cf. Definition [3] and Reniark0]). As a consequence, conflict sets for A4 
can be computed in PTIME, the hitting-sets for j\4 contain actual causes for 
Q, and the hitting-set problem for the diagnosis problems is of the d-hitting-set 
kind. 

The reduction from causality to consistency-based diagnosis allows us to 
apply constructions and techniques for the latter (cf. p7ll49] b to the former. 

Example I 4 (ex. [TT] cont.) The diagnosis problem M = {SD,{S{a 4 ),S{a 3 )}, 
Q) gives rise to the hitting-set framework .Q”(D) = ({S'(a 4 ), 5 ( 03 )}, {{(^(aa), 
S'(a 4 )}}), with {S'(a 3 ), S'(a 4 )} corresponding to the conflict set c = {S'(a 4 ), 

Slas)}. 

i3”(D) has two minimum hitting-sets: {£'(03)} and {£(04)}, which are the 
S-minimal diagnosis for M. Then, the two tuples are actual causes for Q (cf. 
Proposition B- From Proposition [21 p^{S{a3)) = p^{S{a4)) = 1. □ 

The solutions to the diagnosis problem can be used for computing repairs. 


Proposition 10 Consider an instance D with = 0 , a set of DCs of the 
form k: Vx-<{Pi{xi) A- ■ ■ APm{xm), and their associated “abnormality-aware” 
integrity constraints^ in o (in this case we do not need Endp atoms). 

Each S-minimal diagnosis A gives rise to an S-repair of D, namely D /4 = 
D \ {P(c) G D I A6p(c) G A}; and every S-repair can be obtained in this way. 
Similarly, for C-repairs using C-minimal diagnoses. □ 

Example 15 (ex. [T 3 ] cont.) The instance D = {£(03), £(04), £(04,03)}, with 
all tuples endogenous, has three (both S- and C-) repairs with respect to 
the DC K : \/xy^{S{x) A R{x,y) A S{y)), namely Di = {£(03), £(04,03)}, 
D 2 = {£(04), £(04, 03)}, and D 3 = {£(03), £(04)}. They can be obtained as 
Dai,D/^^, Da3 from the only (S- and C-) diagnoses, Z\i = {£(03)}, Z\2 = 
{£(04)}, Zi3 = {£(a4,a3)}, resp. □ 

We have characterized repairs in terms of diagnosis. Thinking of the other 
direction, and as a final remark, it is worth observing that the very particular 
kind of diagnosis problem we introduced above (with restricted logical for¬ 
mulas) can be formulated as a preferred-repair problem [21 Sec. 2.5]. Without 
going into the details, the idea is to materialize tables for the auxiliary pred¬ 
icates Abp and Endp, and consider the DCs of the form ([71) (with the Endp 
atoms when not all tuples are endogenous), plus the DCs dS}, saying that the 
initial extensions for the Abp predicates are empty. If D is inconsistent with 
respect to this set of DCs, the S-repairs that are obtained by only inserting 
endogenous tuples into the extensions of the Abp predicates correspond to 
S-minimal diagnosis, and each S-minimal diagnosis can be obtained in this 
way. 


Notice that these are not denial constraints. 
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6 Complexity Results 

There are three main computational problems in database causality. For a BCQ 
Q and database D: 

(a) The causality problem (CP) is about computing the actual causes for Q. Its 
decision version of this problem, CDP, is stated in ([5]). Both CP and CDP 
are solvable in polynomial time 113, which can be extended to UBCQs (cf. 
Proposition [S]). 

(b) The responsibility problem (RP) is about computing the responsibility 
p^{t) of a given actual cause t. (Since a tuple that is not an actual cause 
has responsibility 0, this problem subsumes (a).) This is a maximization 
problem due to the minimization of ITI in the denominator. 

We will consider the decision version of this problem that, as usual for 
maximization problems [29) . asks whether the real-valued function being 
computed (responsibility in this case) takes a value greater than a given 
threshold v of the form ■^, for a positive integer k. 

Definition 5 For a BCQ Q, the responsibility decision problem (RDP) is (de¬ 
ciding about membership of): 

KVV{Q) = {{D,t,v) I t G e {0}U{i I A: G N+}, and 

D \= Q and pj^{t) > t;}, 

that is, deciding if a tuple has a responsibility greater than a bound v (as a 
cause for Q). □ 

The complexity analysis of RDP in m is restricted to conjunctive queries 
without self-joins. Here, we will generalize the complexity analysis for RDP to 
general CQs. 

(c) Computing the most responsible actual causes (MRC). Its decision version, 
MRCDP, the most responsible cause decision problem, is a natural problem, 
because actual causes with the highest responsibility tend to provide most 
interesting explanations for query answers mM- 

Definition 6 For a BCQ Q, the most responsible cause decision problem is 
(membership of): 

MTiC'D'P{Q) = {{D,t) I t G D" and 0 < p^{t) is a maximum for D}. □ 


We start by analyzing a more basic decision problem, that of deciding if 
a set of tuples T is an S-minimal contingency set associated to a cause t (cf. 
Q). Due to the results in Sections [3] and El it is clear that there is a close 
connection between this problem and the S-repair checking problem [9j Chap. 
5], about deciding if instance D' is an S-repair of instance D with respect to 
a set of integrity constraints. Actually, the following result is obtained from 
the PTIME solvability of the S-repair checking problem for DCs m (see also 
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Proposition 11 For a BCQ Q, the minimal contingency set decision prob¬ 
lem (MCSDP), i.e. A4CSW{Q) := {{D,t,r) \ P is minimal element in 
Cont{D, Q,t)}, belongs to PTIME. 

Proof To decide if {D, t, P) G MCSW{Q), it is good enough to observe, from 
Proposition [U that {D,t,r) G MCS'DV{Q) iS D\ (TU{t}) is an S-repair for 
D with respect to k{Q). S-repair checking can be done in PTIME in data [18]. □ 

We could also consider the decision problem defined in Proposition [TTl but 
with C-minimal P. We will not use results about this problem in the following. 
Furthermore, its connection with the C-repair checking problem is less direct. 
As one can see from Section [3l C-minimal contingency sets correspond to 
a repair semantics somewhere between the S-minimal and C-minimal repair 
semantics (a subclass of Srep, but a superclass of Crep): It is about an S- 
minimal repair with minimum cardinality that does not contain a particular 
tuple. 

Now we establish that RDP is WP-complete for CQs in general. The NP- 
hardness is shown in (47]. Membership of NP is obtained using Proposition 

EH 

Theorem 1 (a) For every BCQ Q, TZ'DV{Q) G NP. 

(b) [47] There are CQs Q for which TZW{Q) is WP-hard. 

Proof (a) We give a non-deterministic PTIME algorithm to solve RDP. Non- 
deterministically guess a subset P C D”, return yes if |T| < ^ and {D,t, 
P) G AiCSW', otherwise return no. According to Proposition [TT] this can be 
done in PTIME in data complexity. □ 

In order to better understand the complexity of RP, the responsibility 
computation problem, we will investigate the functional, non-decision version 
of RDP. 

The main source of complexity when computing responsibilities is related 
to the hitting-set problem associated to = (D",6"(D)) in Remark |4| 

(cf. (O). In this case, it is about computing the cardinality of a minimum 
hitting-set that contains a given vertex (tuple) t. That this is a kind of d- 
hitting-set problem [SO] will be useful in Section o 

Remark 5 Our responsibility problem can also be seen as a vertex cover prob¬ 
lem on the hypergrap^^ 

<3^{D) = {D^,e^{D)) (17) 

associated to io"(D) = (I?"’,6"(D)) (that is, the hitting-set framework can 
be seen as a hypergraph). In it, the hyperedges are the members of ©"■(£>). 

In an hypergraph T-l, a set of vertices is a vertex cover if it intersects every hyperedge. 
A minimal vertex cover has no proper subset that is also a vertex cover. A minimum vertex 
cover has minimum cardinality among the vertex covers. Similarly, an independent set of 
"H is a set of vertices such that no pair of them is contained in a hyperedge. Maximal and 
maximum independent sets are defined in an obvious manner. 
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Determining the responsibility of a tuple t becomes the problem on hyper¬ 
graphs of determining the size of a minimum vertex cover that contains vertex 
t (among all vertex covers that contain the vertex). Again, in this problem the 
hyperedges are bounded in size by IQlS □ 

Example 16 For Q: 3xy{P{x) AR{x,y) AP{y)), and D = D" = {P(a),P(c), 
i?(a, c), R{a, a)}, &{D) = 6"(D) = {{P(a), A(a, a)}, {Pia), Pic), R{a, c)}}. 

The hypergraph ©”(_D) has D as set of vertices, and its hyperedges are 
{P(a), i?(a, a)} and {Pia), Pic), R{a, c)}. Its minimal vertex covers are: vci = 
{P(a)}, VC 2 = {P(c), P(a,a)}, vcz = {Ria,a), Ria,c)}. Only the first has 
minimum cardinality. Accordingly, its only element, Pia), is an actual cause 
with responsibility 1. The other tuples are actual causes with responsibility 
□ 

Remark 6 To simplify the presentation of the next computational problems 
(Lemmas [T] and [2] and Proposition [T^ . we will formulate and address them 
in terms of graphs. However, they still hold for hypergraphs [151l44j . which is 
what we need for the complexity results obtained in the rest of this section. □ 

Lemma 1 (representation lemma) There is a fixed database schema S and 
a BCQ Q S P-iS), without built-ins, such that, for every graph G = iV,E), 
with non-empty E, and v € V, there is an instance D for S and a tuple t G D, 
such that the size of a minimum vertex cover of G containing v is the inverse 
of the responsibility of t as an actual cause for Q. 

Proof Consider a graph G = (F, E), and assume the vertices of G are uniquely 
labeled. 

Consider the database schema with relations Verivo) and Edgesivi,V 2 , e), 
and the conjunctive query Q : 3viV2ei Verivi) A Fer(w 2 ) A Edgesivi,V 2 , e)). 
Ver stores the vertices of G, and Edges, the labeled edges. For each edge 
('^ 1 ,^ 2 ) G E, Edges contains n tuples of the form ivi,V 2 ,i), where n is the 
number of vertices in G. All the values in the third attribute of Edges are 
different, say from 1 to n x \E\. This padding of relation Edge will ensure in 
the rest of the proof that C-minimal contingency sets for the query answer 
consist only of vertices, i.e. elements of Ver (as opposed to Edge tuples). The 
size of the padded instance is still polynomial in the size of G. It is clear that 
D^Q. 

Assume VC is the minimum vertex cover of G that contains vertex v, where 
tuple t is Veriv). Consider the set of tuples A = {Verix) \ x G VC}. Since 
V G VC, A = A' \J{ Veriv)}. Then, D \ (A'U Veriv)) Q. This is because for 
every tuple Edgeivi,Vj,k) in the instance, either Vi or Vj belongs to VC. Due 
to the minimality of VC, D \ A' |= Q. Therefore, tuple Veriv) is an actual 
cause for Q. 

We recall that repairs of databases with respect to DCs can be characterized as maximal 
independent sets of conflict hypergraphs (conflict graphs in the case of FDs) whose vertices 
are the database tuples, and hyperedges connect tuples that together violate a DC lillTsl . 
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Suppose -T is a C-minimal contingency set associated to Ver{v). Due to 
the C-minimality of P, it entirely consists of tuples in Ver. It holds that 
D\(nj{ l/er(u)}) ^ Q and D\r ^ Q. Consider the set VC = {a; | Ver(a;) € 
r}U {u}. Since D \ (D U { Ver{v)}) ^ Q, for every tuple Edge{vi,Vj,k) in D, 
either Vi G VC' or Vj € VC'. Therefore, VC' is a minimum vertex cover of G 
that contains v. It holds that p^{Ver{v)) = So, the size of a minimum 

vertex cover of G that contains v can be obtained from Ver{v)). □ 

Having represented our responsibility problem as a graph-theoretic prob¬ 
lem, we first consider functional computational problems in graphs. 

Definition 7 The minimal vertex cover membership problem (MVCMP) con¬ 
sists in, given a graph G = {V,E), and a vertex v G V as inputs, computing 
the size of a minimum vertex cover of G that contains v. □ 

Lemma 2 Given a graph G and a vertex v in it, there is a graph G' extending 
G that can be constructed in polynomial time in |G|, such that the size of a 
minimum vertex cover for G that contains v and the size of a minimum vertex 
cover for G' coincide. 

Proof The size of VCg(v), the minimum vertex cover of G that contains the 
vertex v, can be computed from the size of Iq, the maximum independent set 
of G, that does not contain v. In fact, 

\VCGiv)\ = \G\-\lG\. (18) 

Since Ig is a maximum independent set that does not contain v, it must 
contain one of the adjacent vertices of v (otherwise, Ig is not maximum, and 
V can be added to Jg). Therefore, | VCg{v)\ can be computed from the size of 
a maximum independent set I that contains v', one of the adjacent vertices of 

V. 

Given a graph G and a vertex v' in it, a graph G' that extends G can be 
constructed in polynomial time in the size of G, in such a way that: there 
is a maximum independent set I of G containing v' iff v' belongs to every 
maximum independent set of G' iff the sizes of maximum independent sets for 
G and G' differ by one. Actually, graph G' can be obtained by adding a new 
vertex v" that is connected only to the neighbors of v'. It holds 

|/g| = |/g'|-1, (19) 

\IG^ = \G'\-\VCG^. ( 20 ) 

where Ig' is a maximum indent set in G', and VCg' is a minimum vertex 

cover of G'. From (fT51) . (fT^ and (1^ . we obtain: | VCg{v)\ = | VCg'\- D 

From Lemma[2]and the FP^^*'^°®*'"^^-completeness of determining the size 
of a maximum clique in a graph [39], we obtain: 


This construction is inspired by |431 Lemma 1]. More details can be found in m- 
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Proposition 12 The MVCMP problem for graphs is _F'P'^'^^*°®^"'^^-complete. 

Proof We prove membership by describing an algorithm in for 

computing the size of the minimum vertex cover of a graph G = {V, E) that 
contains a vertex v &V. We use Lemma [H and build the extended graph G'. 

The size of a minimum vertex cover for G' gives the size of the minimum 
vertex cover of G that contains v. Since computing the maximum cardinality of 
a clique can be done in time qj computing a minimum vertex 

cover can be done in the same time (just consider the complement graph). 
Therefore, MVCMP belong to . 

Hardness can be obtained by a reduction from computing minimum ver¬ 
tex covers in graphs to MVCMP. Given a graph G construct the graph G' as 
follows: Add a vertex u to G and connect it to all vertices of G. It is easy 
to see that v belongs to all minimum vertex covers of G'. Furthermore, the 
sizes of minimum vertex covers for G and G' differ by one. Consequently, the 
size of a minimum vertex cover of G can be obtained from the size of a mini¬ 
mum vertex cover of G' that contains v. Computing the minimum vertex cover 
is FP^^^^°®*'"^^-complete. This follows from the F'P'^'^*'*°®^”^^-completeness of 
computing the maximum cardinality of a clique in a graph |39j . □ 


Theorem 2 (a) For every BCQ, Q, computing the responsibility of a tuple 
as a cause for Q is in 

(b) There is a database schema and a BCQ Q, without built-ins, such that 
computing the responsibility of a tuple as a cause for Q is pp^^^’‘°3^'G)_ 
complete. 

Proof For membership, we observe from Remark [5] that computing a tuple’s 
responsibility amounts to computing the size of a minimum vertex cover con¬ 
taining the tuple in the graph associated to the query and instance at hand. 
By Proposition [121 this problem belongs to pp^^^’-°3M\ 

Hardness follows from Lemma [Hand the hardness result in Proposition [121 

□ 

Now we address the most responsible causes problem, MRCDP (cf. Defi¬ 
nition [i. We use the connection with consistent query answering of Section 
1131 namely CorollarylH and the P'^'^*^^°®(”)hcompleteness of consistent query 
answering under the C-repair semantics for queries that are conjunctions of 
ground atoms and a particular DC m Theorem 4]. 

Theorem 3 (a) For every BCQ, MnCVV{Q) G P^P(^° 9 in)) 

(b) There is a database schema and a BCQ Q, without built-ins, for which 
MTZCVPiQ) is P^'f'('°s("))-complete. 

Proof (a) To show that MTZCWiQ) belongs to p^P(^°9('^))^ consider first the 
hitting-set framework i 3 "(D) = (D",©"(D)) (cf. Definition [31 and E]) and its 
associated hypergraph ©”(D) (cf. (fTTll l. 
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It holds that t is a most responsible cause for Q iff Sy^{D) has a C-minimal 
hitting-set that contains t (cf. PropositionjB]). Therefore, t is a most responsible 
cause for Q iff t belongs to some minimum vertex cover of ©"(I?). 

It is easy to see that ©"■(I?) has a minimum vertex cover that contains t iff 
<S^{D) has a maximum independent set that does not contains t. Checking if t 
belongs to all maximum independent set of ©"(_D) can be done in 
[43l Lemma 2]. 

If t belongs to all independent sets of ©”(II), then {D,t) ^ MTZCVV{Q); 
otherwise {D,t) £ MTZCVV{Q). As a consequence, the decision can be made 
in time pNP{log{n)) ^ 

(b) The proof is by a reduction, via Corollary|TJ from consistent query answer¬ 
ing under the C-repair semantics for queries that are conjunctions of ground 
atoms, which was proved to be P^^^^“®*^"^Acomplete in [43l Theorem 4]. Ac¬ 
tually, that proof (of hardness) uses a particular database schema S and a 
DC K. In our case, we can use the same schema S and the violation query D” 
associated to k (cf. Section 0]). □ 

From Proposition inland the -completeness of determining the 

size of C-repairs for DCs [331 Theorem 3], we obtain the following for the 
computation of the highest responsibility value. 

Proposition 13 (a) For every BCQ, computing the responsibility of the most 
responsible causes is in pp^^d“5(")). 

(b) There is a database schema and a BCQ Q, without built-ins, for which 
computing the responsibility of the most responsible causes is pp^^d“5("))- 
complete. 

Proof (a) To show the membership of ^ consider the hypergraph 

©"(D) as obtained in Theorem[3l The responsibility of most responsible causes 
for Q can be obtained from the size of the minimum vertex cover of ©"(D) (cf. 
Proposition HI). The size of the minimum vertex cover in a graph can be com¬ 
puted in pp^^d“5("))^ which is obtained from the membership of pp^^d“5("b 
of computing the maximum cardinality of a clique in graph [39] . 

It is easy to verify that minimum vertex covers in hypergraphs can be 
computed in the same time. 

(b) This is by a reduction from the problem of determining the size of C-repairs 
for DCs shown to be FP^^^^“®*'"^^-complete in [331 Theorem 3]. Actually, that 
proof (of hardness) uses a particular database schema S and a DC n. In our 
case, we may consider the same schema S and the violation query F" associ¬ 
ated to K (cf. Section 131). 

The size of C-repairs for an inconsistent instance D of the schema S with 
respect to k can be obtained from the responsibility of most responsible causes 
for F” (cf. Corollary HI . □ 
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6.1 FPT of responsibility 

We need to cope with the intractability of computing most responsible causes. 
The area of fixed parameter tractability (FPT) provides tools to attack 
this problem. In this regard, we recall that a decision problem with inputs 
of the form (/,p), where p is a distinguished parameter of the input, is fixed 
parameter tractable (or belongs to the class FPT), if it can be solved in time 
0{f{\p\) ■ |.?|°), where c and the hidden constant do not depend on \p\ or |J|, 
and / does not depend on |/|. 

In our case, the parameterized version of the decision problem TZ'D'P{Q) 
(cf. Definition [S]) is denoted with TZW^iQ), and the distinguished parameter 
is k, such that f = p 

That Ti,VV^{Q) belongs to FPT can be obtained from its formulation as 
a d-hitting-set problem (d being the fixed upper bound on the size of the sets 
in the set class). The latter problem consists in, given a hitting-set framework 
with d-bounded subsets and an element t (a tuple in our case), deciding if 
there is a hitting-set of cardinality smaller that k that contains t. This problem 
belongs to FPT. 

Theorem 4 For every BCQ Q, TL'DV^{Q) belongs to FPT, where the param¬ 
eter is the inverse of the responsibility bound. 

Proof First, there is a PTIME parameterized algorithm for the d-hitting-set 
problem about deciding if there is a hitting-set of size at most k that runs in 
time 0{e^ + n), with n the size of the underlying set and e = d — I -I- o{d~^) 
[5(1] . In our case, n = |ZI|, and d = \Q\ (cf. also [26]). 

Now, to decide if the responsibility of a given tuple t is greater than v = 
we consider the associated hypergraph 0^{D), and we decide if it has a vertex 
cover that contains t and whose size is less than k. In order to answer this, we 
use Lemma m and build the extended hypergraph 0'. 

The size of a minimum vertex cover for 0' gives the size of the minimum 
vertex cover of <S^{D) that contains t. If has a vertex cover that con¬ 

tains t of size less than k, then 0' has a vertex cover of size less than k. If 0' 
has a vertex cover of size less than k, its minimum size for a vertex cover is 
less than k. Since this minimum is the same as the size of a minimum vertex 
cover for <3'^{D) that contains t, 0”(dl) has a vertex cover of size less than 
k that contains t. As a consequence, it is good enough to decide if 0' has a 
vertex cover of size less than k. For this, we use the hitting-set formulation of 
this hypergraph problem, and the already mentioned FPT algorithm. □ 

This result and the corresponding algorithm sketched in its proof show 
that the higher the required responsibility degree, the lower the computational 
effort needed to compute the actual causes with at least that level of respon¬ 
sibility. In other terms, parameterized algorithms are effective for computing 
actual causes with high responsibility or most responsible causes. In general, 
parameterized algorithms are very effective when the parameter is relatively 
small |2H]- 
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Now, in order to compute most responsible causes, we could apply, for each 
actual cause t, the just presented FPT algorithm on the hypergraph ©"(£>), 
starting with fc = 1, i.e. asking if there is vertex cover of size less than 1 that 
contains t. If the algorithm returns a positive result, then t is a counterfactual 
cause, and has responsibility 1. Otherwise, the algorithm will be launched with 
k = 2, 3,..., I/?”I, until a positive result is returned. (The procedure can be 
improved through binary search on fc = 1, 2, 3,..., m, with m possibly much 
smaller than |D|.) 

The complexity results and algorithms provided in this section can be 
extend to UBCQs. This is due to Remark [5] and the construction of 6"(Z3), 
which the results in this section build upon. 

For the d-hitting-set problem there are also efficient parameterized approx¬ 
imation algorithms m- They could be used to approximate the responsibility 
problem. Furthermore, approximation algorithms developed for the minimum 
vertex cover problem on bounded hypergraphs msn should be applicable 
to approximate most responsible causes for query answers. Via the causal¬ 
ity/repair connection (cf. Section 02]), it should be possible to develop ap¬ 
proximation algorithms to compute S-repairs of particular sizes, C-repairs, 
and consistent query answers with respect to DCs. 


6.2 Complexity of diagnosis with positive disjunctive rules 

It is known that consistency-based diagnosis decision problems can be unsolv- 
able [S3]. However, there are decidable classes of FO diagnosis specifications, 
and those classes are amenable to complexity analysis. However, there is little 
research on the complexity analysis of solvable classes of consistency-based 
diagnosis problems. The connection we established in the previous sections 
between causality, repairs and consistency-based diagnosis can be used to ob¬ 
tain new algorithmic and complexity results for the latter. Without trying to 
be exhaustive about this, which is beyond the scope of this paper, we give an 
example of the kind of results that can be obtained. 

Considering the diagnosis problem we obtained in SectionjS] we can define a 
class of diagnosis problems. Cf. ExampledT] in particular (II2L for motivation. 

Definition 8 A disjunctive positive (DP) diagnosis specification A is a con¬ 
sistent FO logical theory, such that: 

(a) A has a signature (schema) consisting of a finite set of constants, a set of 

predicates <S, a set of predicates of the form with R G S, and 

Abji with the same arity of R. S and 5“^ are mutually disjoint. 

(b) A is inconsistent with AB'^ := {ix^Abnlx) — false) | R € 5}. 

(c) Consists of: 

(cl) Sentences of the form ^x{C{x) —\J^Ab[i^{xi)), with Xj C x, and 
C'(x) a conjunction of atoms that does not include Afe-atoms of any 
kind. 


17 


Or any other “abducible” predicates that are different from those in S. 
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(c2) Sentences of the forms 'ix{Abii{x) —(i?(a;) A S'(a;))), with S'€ 5. 
(c3) A finite background universal theory T expressed in terms of predicates 
in S (and constants) that has a unique Herbrand rnodell^ □ 

As above, a diagnosis is a set of A6/j-atoms that, when assumed to be true, 
restores the consistency of the correspondingly modified S U AB^. 

There are at least two important computational tasks that emerge, namely, 
given a disjunctive positive (DP) diagnosis specification S together with AB^: 

1. The minimum-cardinality diagnosis (MCD) problem, about computing 
minimum-cardinality diagnoses. 

2. The minimal membership diagnosis, (MMD) about computing minimum- 
cardinality diagnoses that contain a given A6-atom. 

It is not difficult to see that these problems are computable (or solvable in 
their decision versions). Now we can obtain complexity lower bounds for them. 
Actually, in Section [SJ the responsibility and most responsible causes problem 
were reduced to diagnosis problems for specifications that turned out to be 
disjunctive positive (see dm). 

More specifically. Proposition IH] reduces computing responsibility of a tu¬ 
ple to computing the size of a minimum-cardinality diagnosis that contains 
the tuple. Furthermore, as a simple corollary of Proposition [51 we obtain the 
computation of minimum-cardinality diagnoses allows us to compute most re¬ 
sponsible causes. Now, combining all this with Proposition [T2] and Theorem 
m we obtain the following lower bounds for our diagnosis problems. 

Theorem 5 For disjunctive positive diagnosis specifications, the MCD and 
MMD problems are FP^^^^°®^"^^-hard in the size of their underlying Herbrand 
structure. □ 


7 Preferred Causes for Query Answers 

In Section [3] we characterized causes and most responsible causes in terms of 
S-repairs and C-repairs, resp. We could generalize the notion of a cause and/or 

its responsibility by using, in principle, any repair semantics S. The latter is 

c 

represented by a class of repairs Rep {D, S), of D with respect to a set of 
denial constraints (cf. Section [2.21) . When dealing with (sets of) DCs, the 
repair actions can only be of certain kinds. Usually tuple deletions have been 
considered. This is the case of the S- and C-repairs we have considered in this 
work so far. 

We could go beyond and consider the notion of prioritized repair . Also 
changes of attribute values can be the chosen repair actions, including the use 
of null values, to “destroy” joins (again, with different semantics, e.g. with 
nulls a la SQL [T51I5] 1. 


This condition is clearly satisfied by the logical reconstruction of a relational database, 
but can be relaxed in several ways. 
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In this section we explore the possibility of introducing a notion of preferred 
cause that is based on a given repair semantics. This idea is inspired by (and 
generalizes) the characterization of causes in terms of repairs that we obtained 
before, namely ((J), (H)), Proposition [U and Corollary [TJ 

If we define causes and their (minimal) contingency sets on the basis of 
a given repair semantics, the minimality condition involved in the latter will 
have an impact on the notion of minimal (or preferred) contingency set, and 
indirectly, on the notions of responsibility and most responsible causeF^I 
In Section l7.II we summarize prioritized repairs. In Section 17.21 we impose 
preferences on causes on the basis of the prioritized repairs introduced in |59] 
(and further investigated in [IS]). In Section FTSl we briefly investigate the 
possibility of capturing endogenous repairs, i.e. that do not change exogenous 
tuples, by means of a priority relation. Finally, in Section 17.41 we briefly con¬ 
sider the possibility of defining (preferred) causes via attribute-based repairs 
that use null values. 


7.1 Prioritized repairs 

The prioritized repairs in [53] are based on a priority relation^ on the set 
of database tuples. In the case of a pair of (mutually) conflicting tuples, i.e. 
that simultaneously violate a constraint in a given set set of DCs (possibly in 
company of other tuples), the repair process reflects the user preference -as 
captured by the priority relation- on the tuples that are privileged to be kept 
in the database, i.e. in the intended repairs. 

Given such a priority relation, in [59] different classes of prioritized repairs 
are introduced, namely the class of globally optimal repairs, that of Pareto- 
optimal repairs, and that of completion-optimal repairs. Intuitively, each class 
relies on a different optimality criterion that is used to extend the priority 
relation on pairs of conflicting facts to a priority relation on the set of S- 
repairs. As a consequence, each of these three classes is contained in that of 
the S-repairs. In particular, all these repairs are based on tuple deletions. 

Let us denote with Rep^'^{D, E) the class of all prioritized repairs based 
on >- and the optimality criterion X. Its elements are called ()^, X)-prioritized 
repairs of D with respect to a the set S of DCs. It holds Rep^'^{D, E) C 
Srep{D, E), and then, all the elements of Rep^’^{D, E) are subsets of D. 

In order to show a concrete class Rep^’^{D, E), we hrst recall the defini¬ 
tions of priority relation and global-optimal repair from m- 


We could say that the efforts in |35II36| to modify the Halpern-Pearl (HP) original 
definition of causality are about considering more appropriate restrictions on contingencies. 
Since in some cases the original HP definition does not provide intuitive results regarding 
causality, the modifications avoid this by recognizing some contingencies as “unreasonable” 
or “farfetched”. 
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Definition 9 Given an instance D and a set of denial constraints E , a binary 
relation on I? is a priority relation with respect to S if: (a) is acychc, 
and (b) for every t, f € D, if t t', then t and t' are mutually conflicting^ D 

Definition 10 Let D be an instance, S a set of DCs, and >- a corresponding 
priority relation. Let D' and D" be two consistent sub-instances of D. D' is a 
global improvement of D" if D' 7 ^ D", and for every tuple t' € D" \ D', there 
exists a tuple t € D' \ D" such that t >- t'. D' is a global-optimal repair of D, 
if D' is an S-repair and does not have a global improvement. □ 

In this definition, the optimality criterion, a possible X above, is that of global- 
optimal repair, or (;^, go)-repair, which leads to a class Rep^’^°{D, E). We 
consider this repair semantics just for illustration purposes. 

Example 17 Consider the database schema Author{Name, Journal)^ 

Journal{JournalN, Topic, Paper and the following instance 
D: 


Author 

Name 

Journal 

Journal 

JournalN 

Paper# 

Topic 


John 

TKDE 


TKDE 

30 

XML 


Tom 

TKDE 


TKDE 

31 

CUBE 


John 

TODS 


TODS 

32 

XML 


Consider the following denial constraint: 


k: \fxyzz'^{Author{x, y) a Joumal{y, z, z') a X = John a z' = XML), (21) 

capturing the condition that “John has not published a paper in a journal that 
has published papers on XML”. 

D is inconsistent with respect to k, and contains the following sets of 
conflicting tuples: 

Cl = {Author{John, TKDE), Journal{TKDE, 30, XML)}, 

C 2 = {Author[John, TODS), Journal{TODS, 32, XML)}. 

D has the following S-repairs, each obtained by deleting one tuple from each 
of Cl and C 2 , to resolve the conflicts: 

Di = {Author{Tom, TKDE), Journal{TKDE, 31, CUBE), Author{John, TODS), 
Joumal{TKDE, 30, XML)} 

D 2 = {Author{Tom, TKDE), Journal(TKDE, 31, CUBE), Journal{TKDE, 30, XML), 
Joumal(TODS, 32, XML)} 

D 3 = {Author{Tom, TKDE), Journal{TKDE, 31, CUBE), Author {John, TKDE), 
Journal{TODS, 32, XML)} 

D 4 = {Author{Tom, TKDE), Journal{TKDE, 31, CUBE), Author {John, TKDE), 
Author{John, TODS)} 

We can say {t, t'} is a conflict, i.e. the two tuples jointly participate in the violation of 
one of the DCs in S. 
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(a) Now, assume a user prefers to resolve a conflict by removing tuples from 
the Author table rather than the Journal table, maybe because he considers 
the latter more reliable than the former. This is expressed the following priority 
relationships on conflicting tuples: Joumal(TKDE,30,XML) y Author(John, TKDE) 
and Journal(TODS,32,XML) y Author(John,TODS). 

In this case only D 2 is a global-optimal repair. Actually, D 2 is a global 
improvement over each of Di, D 3 and D 4 . For Di, for example: D 2 \ Di = 
{journal(TODS,32,XML} and Hi \ Z ?2 = {Author(John, TODS)}. We can see that, 
for each tuple in D 2 \ Di , there is a tuple in Hi \ H 2 that has a higher priority. 
Therefore, H 2 is a global improvement on Hi. So, in this case Rep^’^°{D, k) = 
{D 2 } 

In this case, the uniqueness of the global-optimal repair is quite natural 
as the preference relation among conflicting tuples is a total relation. So, we 
know how to resolve every conflict according to the user preferences. 

(b) For a more subtle situation, assume the user has the priorities as before, 
but in addition he tends to believe that John has a paper in TODS. In this 
case we have only the relationship Joumal(TKDE,30,XML) y' Author)John, TKDE), 
and no preference for resolving the second conflict. Now both Hi and H 2 are 
global-optimal repairs. That is, now Rep^’^°{D, k) = {Hi,H 2 }. □ 


7.2 Preferred causes from prioritized repairs 

According to the motivation provided at the beginning of this section, we now 
define preferred causes on the basis of a class of prioritized repairs. (Compare 
(El below with o and @.) To keep things simple, we concentrate on single 
BCQs, Q, whose associated denial constraints are denoted by k{Q). 

Before providing technical details, we motivate the notion of preference in 
the context of causality. In this direction, first notice that under actual causal¬ 
ity, we already make a difference -and only this difference- between endogenous 
and exogenous tuples. We can think of extending this priority relation among 
tuples in such a way that, for example, we prioritize -as causes- tuples in a 
given relation R, and we are not interested in tuples in another relation S. So, 
the user can specify a priority relation between the two relations, or different 
scores for these relations [33]. 

In Section H?^ actual causes and their minimal contingency sets for a UBCQ 
were characterized as the minimal hitting-sets of the collection C of minimal 
subsets of a database that entail the query. Those minimal hitting-sets are 
obtained by removing at least one tuple from each of the elements of C (cf. 
Proposition Hj). At this point, user preferences, or priorities, could be applied 
to tuples that belong to a same set C. 

Definition 11 Given an instance H and a BCQ Q, tuples t and t' are jointly- 
contributing ii t ^ t', and there exists an S-minimal A G D such that A\= Q 
and t, t' € A. □ 

Now we define priority relations on jointly-contributing tuples. 
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Definition 12 Given an instance D and a BCQ Q, a binary relation >-c on 
D \s a causal priority relation with respect to Q if: (a) >-c is acyclic, and (b) 
for every t, t' € D, ii t >-c then t and t' are jointly-contributing tuples. □ 

This definition introduces a natural notion of preference on causality. Actu¬ 
ally, this way of approaching priorities on causes is in (inverse) correspondence 
with preference on repairs as based on priority relations on conflicting tuples. 
To see this, first observe that for a given instance D and BCQ Q: t and t' are 
jointly-contributing tuples for Q iff t and t' are mutually conflicting tuples 
for k(Q). 

Next, in the context of prioritized repairs, a priority relation reflects a 
user preference on tuples that are preferred to be kept in the database. This 
is the inverse of causality, where a causal priority relation, as we defined it, 
reflects the tuples that are preferred to be (hypothetically or counterfactually) 
removed from database, to make them preferred causes. 

In the following assume is the inverse of a causal priority relation >-c. 
That is, t t' iff t' >-c t. Clearly, )>-J is acyclic, and can be imposed, with 
the expected result, on pairs of conflicting tuples. As a consequence, )>-J can 
be used to define prioritized repairs. 

Definition 13 Let D be an instance, Q a BCQ, t a tuple in D, >-c a causal 
priority relation on D’s tuples. 

(a) Diff'^-’^{D,K{Q),t) := {D \ D' \ D' & Rep^- k{Q)), and 

tGD\D'}. (22) 

(b) t € D is a ^-preferred cause ior Q iff Diff^‘^K{Q),t) ^ 9. D 

Notice that every (>-c, A)-preferred cause is also an actual cause. This fol¬ 
lows from Proposition[I]and the fact that prioritized repairs are also S-repairs. 

Similarly to Proposition [2j for each A G Diff^’= ’^{D, k(Q), t), it holds that 
t G A, t is a (>-c, A)-preferred cause, and also an actual cause for Q with 
S-minimal contingency set A \ {f}. In particular, t’s responsibility can be 
defined and computed as before, but now restricting its contingency sets to 
those of the form A \ {t}, with A G Diff^<^ '^{D, k{Q), t). In this way, a causal 
priority relation may affect the responsibility of a cause (with respect to the 
non-prioritized case). 

Example 18 (example [T71 cont. 1 The following BCQ query Q is true in D: 

3JournalN 3Paper^{Author{lohr\, Journal) A 

Journal{JournalN, Paper^, XML)); 

and its associated DC k{Q) is k in ((^ . 

We want to obtain the preferred causes for Q being, possibly unexpectedly, 
true in D, with the following preferences: (a) We prefer those among the Author 
tuples, (b) It is likely that John does have a paper in TODS. So, we prefer 
Author(John, TODS) not to be the cause. 
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These causal priorities are in inverse correspondence with those in the 
second case of Example nab) about priorities for repairs. That is, for our 
causal priority relation here, its inverse is in Example I17r bl. There 
we had Rep^ k{Q)) = {Di,D 2 }, which we can use to apply Definition 

M 

We obtain as the globally-optimal causes, i.e. as (k-c, 5 o)-causes: 
Author(John, TKDE), Author(TODS,32,XML) and Author (John,TODS), all 
with the same responsibility, □ 

Notice that Definition [T51 can be easily extended to UBCQs. This is done, 
as earlier in this work, by considering the set E of denial constraints associated 
to a UBCQ. In the other direction, we recall that if we start with a set of DCs 
S, the corresponding UBCQ is denoted with . 

As we did in the previous sections of this work, we could take advantage of 
algorithmic and complexity results about prioritized repairs IMM15], to obtain 
complexity results for preferred causes problems. As an example, we establish 
the complexity of the minimal contingency set decision problem for {>-c,go)- 
preferred causes. More precisely, for an instance D and a UBCQ Q, the min¬ 
imal preference-contingency set (decision) problem is about deciding if a set 
of tuples r is an S-minimal contingency set associated to a (k-c, 5 o)-preferred 
cause t. 

Notation: Coni^’"'^{D, Q,t) := {A \ {<} | A G Diff^<=K{Q),t)} is the 
class of all S-minimal contingency sets for a (>-c, A)-preferred cause t. 

Definition 14 For a UBCQ Q, the minimal preference-contingency set deci¬ 
sion problem is about membership of: 

MPCVriQ) :={{D,>-c,t,r)\tGD,rcD, and re Cont^'=’^° {D, Q,t)}. 

□ 


From Definition m there is a close connection between JOIVCDV and 
the global-optimal repair checking problem, i.e. about deciding if an instance 
D' is a (;^, ( 70 )-repair of D with respect to a set of denial constraints. If we 
accept functional dependencies (FDs) among our denial constraints (and then, 
UBCQs that involve inequalities), the following result can be obtained from 
the NP-completeness of globally-optimal repair checking for FDs. 

Proposition 14 For a UBCQ Q with inequalities, MVCVViQ) is AfP-hard. 

Proof It is good enough to reduce globally-optimal repair checking to our 
contingency checking problem. So, consider an inconsistent instance D with 
respect to a set of denial constraint S, a priority relation for repairs )^, and 
D' C D. To check if D' G Rep^’^°{D, S) we can check, for an arbitrary ele¬ 
ment tGD\D',A {D'U{t}) G MVCVV{V^). □ 

It is worth contrasting this result with the tractability result in Proposi¬ 
tion [TT] for the minimal contingency set decision problem (MCSDP) for actual 
causes. Notice that Proposition [TT] still holds for UBCQs with inequality. 
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Notice that we could generalize the notion of preferred cause by appealing 
to any notion of repair. More precisely, if we have a repair semantics rSem 
(based on tuple deletions for DCs), we could replace Rep^’^[D, k(Q)) in (l22ll 
by Rep^{D, k{Q)). However, to obtain the intended results for causes, we have 
to be careful, as above, about a possible inverse relationship between preference 
on repairs and preference on causes. 


7.3 Endogenous repairs 

The partition of a database into endogenous and exogenous tuples that is 
used in the causality setting may also be of interest in the context of repairs. 
Considering that we should have more control on endogenous tuples than 
on exogenous ones, which may come from external sources, it makes sense to 
consider endogenous repairs, which would be obtained by updates (of any kind) 
on endogenous tuples only. (Of course, a symmetric treatment of “exogenous” 
repairs is also possible; what is relevant here is the partition.) 

For example, in the case of DCs, endogenous repairs would be obtained by 
deleting endogenous tuples only. More formally, given D = D" U , possibly 
inconsistent with a set of DCs S, an endogenous repair D' of D is a maximally 
consistent sub-instance of D with D \ D' C D", i.e. D' keeps all the exoge¬ 
nous tuples of D. If endogenous repairs form the class Srep'^{D, S), it holds 
Srep^{D,E) C Srep{D,E). 

Example 19 Consider D = D" U , with D" = {R{a 2 , oi), i?(a 4 , 03 ), 5 ( 03 ), 
S'(a4)} and = {i?(a3,03), 5(02)}, and the DC k : —i3xy{S{x) A R{x, y) A 

s\y))- 

Here, Srep{D, k) = {Di, D 2 , D 3 }, with Di = {R{a 2 , oi), 17 ( 04 , 03 ), i?(a 3 , 03 ), 
5'(a4),S'(a2)}, £>2 = { 77 ( 02 , 01 ), £'( 03 ), 5 '(a 4 ),S'(a 2 )}, and D 3 = {i 7 (o 2 , oi), 
77 ( 04 , 03 ), 5 '(o 3 ), S'(o 2 )}. The only endogenous S-repair is 77i. □ 

In this section, without trying to be exhaustive or detailed, we consider the 
possibility of definingendogenous repairs on the basis of a suitable priority 
relation on tuples^ while at the same time taking advantage of the op 
optimality condition considered in Section [7I0 

First, if we assume that relation >-', the extension of is such, that t t' 
when t G and t' G D" (>-' is if the latter already has this property), 
then it is easy to verify that every endogenous S-repair globally improves 
any non-endogenous S-repair. As a consequence, if there is an endogenous S- 
repair, then all the (>-', 5 o)-repairs are endogenous. Notice that the extension 

may destroy the acyclicity assumption on the priority relation, because we 
are starting from a given (acyclic) relation >-, which we are now extending. 

Pairs of conflicting tuples would inherit the priority relationships from the general pri¬ 
ority relation. 

Of course, we could use other optimality criteria at this points, but considering all 
possibilities is beyond the scope of this work. 
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It might be the case that there is no endogenous S-repair, in which case 
non-endogenous S-repairs would not the improved by an endogenous one. So, 
if we want to prevent the existence of non-endogenous repairs, we can add an 
extra, dummy predicate D{-) to the schema, and the endogenous tuple D{d) 
to D. We modify every DC in E, say k: ■<— C{x), by adding an extra, dummy 
condition: k‘^ : ^ D{d),C{x), obtaining a set E‘^ of DCs. In this case, the 
S-repairs will be: ■= D \ {D{d)}, which is endogenous, and also all those 

S-repairs of D with respect to E (now each including D{d)). The latter are 
all non-endogenous. If we assume that t y' D{d), for every t G D^, then every 
non-endogenous S-repair will be improved by D‘^, and will not be considered. 


7.4 Null-based causes 

Consider an instance D = {i?(ci,..., c„),...} that may be inconsistent with 
respect to a set of DCs. The allowed repair updates are changes of attribute 
values by the constant null. We assume that null does not join with any other 
value, including null itself. 

In order to keep track of changes, we may introduce numbers as first 
arguments in tuples, as global tuple identifiers (ids). So, D becomes D = 
{i?(l; Cl,..., c„),...}. Assume that id{t) returns the id of the tuple t G D. For 
example, zd(i?(l; ci,..., c„)) = 1 . 

If, by updating D into D' in this way, the value of the ith attribute in 
R is changed to null, then the change is captured as the string These 

strings are collected forming the set D'). For example, if D = 

{i?(l; a, b), S{2; c, d), S'(3; e, /)} is changed into D' = {i?(l; a, null), S'(2; null, d), 
S{3; null, null)}, we have D') = {i?[l; 2], S'[2; 1], S'[3; 1], S'[3; 2]}. 

A nuH-repair of D with respect to a set of DCs A is a consistent instance 
D', such that Diff "“^^(D, D') is minimal under set inclusionl^ E) 

denotes the class of null-based repairs of D with respect to E. 

Example 20 (example [TO] cont.) Consider the following inconsistent instance 
with respect to DC k: -<3xy{S{x) A R{x,y) A S{y)): 

D = {i?(l; 02 , oi), A(2; 03 , 03 ), i?(3; 04 , 03 ), 5(4; 02 ), 5(5; 03 ), 5(6; 04 )}. 

For simplicity, we do not make any difference between endogenous and 
exogenous tuples. Here, the class of null-based repairs, Rep^'^^^D, k), is formed 
by: 

Di = {i?(l; 02 , oi), i?(2; 03 , 03 ), 5(3; 04 , 03 ), 5(4; 02 ), 5(5; null), 5(6; 04 )}, 
D 2 = {5(1; 02 , oi), 5(2; null, 03 ), 5(3; 04 , null), 5(4; 02 ), 5(5; 03 ), 5(6; 04 )}, 
D 3 = (5(1; 02 , oi), 5(2; null, 03 ), 5(3; 04 , 03 ), 5(4; 02 ), 5(5; 03 ), 5(6; null)}, 
D 4 = (5(1; 02 , oi), 5(2; 03 , null), 5(3; 04 , null), 5(4; 02 ), 5(5; 03 ), 5(6; 04 )}, 
D 5 = (5(1; 02 , oi), 5(2; 03 , null), 5(3; null, 03 ), 5(4; 02 ), 5(5; 03 ), 5(6; 04 )}, 
Dq = (5(1; 02 , oi), 5(2; 03 , null), 5(3; 04 , 03 ), 5(4; 02 ), 5(5; 03 ), 5(6; null)}. 
Here, D*^™““(D,D 2 ) = {5[2; 1], 5[3; 2]}, and D 3 ) = {5[2;1], 

5[6;1]}. □ 


An alternative, but equivalent formulation can be found in [8]. 
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According to the motivation provided at the beginning of this section, we 
can now define causes appealing to the class of null-based repairs of D. Since 
repair actions in this case, are attribute-value changes, causes can be defined 
at both the tuple and attribute levels. The same applies to the definition of 
responsibility (in this case generalizing Proposition [5]). 

Definition 15 For D an instance and Q a BCQ, and t € D he a. tuple of the 
form R{i; ci,..., c„). 

(a) ; Cj] is a null-based attribute-value cause for Q if there is D' € Rep™^^ {D, 
/c(Q)) with 

(That is, the value Cj for attribute Aj in the tuple is a cause if it is changed 
into a null in some repair.) 

(b) t is a null-based tuple cause for Q if some R[r,Cj] is a null-based attribute- 
value cause for Q. 

(That is, the whole tuple is a cause if at least one of its attribute values is 
changed into a null in some repair.) 

(c) The responsibility, / 9 *'™**(t), of t, a null-based tuple cause for Q, is the in- 
verseof min{\Dijf"''^'‘''D')\ : R[i; j] € DijJ"’'^''''(D, D'), for some j, and 
D' G i?ep™“(i:»,«;(Q))}. 

(d) The responsibility, Cj]), of R[i]Cj], a null-based attribute-value 

cause for Q, is the inverse of min{\Diff”'^'''‘{D,D')\ : R[i;j] G Diff™^\D, 
D'), and D' G Rep'^'^^\D, «;(Q))}. □ 

In cases (c) and (d) we minimize over the number of changes in a repair that 
are made together with that of the candidate tuple/attribute-value to be a 
cause. In the case of a tuple cause, any change made in one of its attributes is 
considered in the minimization. For this reason, the minimum may be smaller 
than the one for a fixed attribute value change; and so the responsibility at 
the tuple level may be greater than that at the attribute level. More precisely, 
if t = i?(i;ci,... ,c„) G D, and R[i-,Cj\) is a null-based attribute-value cause, 
then it holds Cjj) < p^-™''(t). 

Example 21 (ex. [20] cont.) Consider i?( 2 ;a 3 ,a 3 ) G D. Its projection on its 
first (non-id) attribute, i?[ 2 ;a 3 ], is an attribute-level cause since i?[2; 1] G 
£ijif™“(£),D 2 ). Also i?[2;l] G £> 3 ). 

Since pZj 9 -™“(D,D 2 )| = Dg)! = 2, it holds p“-™““(i?[2; Ij) = 

1 

2 • 

Clearly i?(2; 03 , 03 ) is a null-based tuple cause for Q, with p*'”“**(t) = i. □ 

Notice that the definition of tuple-level responsibility, i.e. case (c) in Def¬ 
inition does not take into account that a same id, i, may appear several 
times in a Dijf'^^’‘\D, D'). In order to do so, we could redefine the size of the 
latter by taking into account those multiplicities. For example, if we decrease 
the size of the Dijf by one with every repetition of the id, the responsibility 
for a cause may (only) increase, which makes sense. 
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8 Discussion and Conclusions 

Our work opens interesting research directions, some of which are briefly dis¬ 
cussed below. They are matter of ongoing and future research. 


8.1 Endogenous repairs 

As discussed in Section 0 the partition of a database into endogenous and 
exogenous tuples may also be of interest in the context of repairs. We may 
prefer endogenous repairs that change (delete in this case) only endogenous 
tuples. However, if there are no endogenous tuples, a preference condition 
could be imposed on repairs, keeping those that change exogenous tuples the 
least. This is something to explore. 

As a further extension, it could be possible to assume that combinations of 
(only) exogenous tuples never violate the integrity constraints, which could be 
checked at upload time. In this sense, there would be a part of the database 
that is considered to be consistent, while the other is subject to possible repairs. 
For somehow related research, see m- 

Going a bit further, we could even consider the relations in the database 
with an extra, binary attribute, N, that is used to annotate if a tuple is 
endogenous or exogenous (it could be both), e.g. a tuple like R{a,b, yes), in¬ 
tegrity constraints could be annotated too, e.g. the “exogenous” version of 
DC At, could be At'® : ■«— P{x, y, yes), R{y, z, yes), and could be assumed to be 
satisfied. 


8.2 Objections to causality 

Causality as introduced by Halpern and Pearl in [5^155] , aka. HP-causality, is 
the basis for the notion of causality in m- HP-causality has been the object of 
some criticism [55], which is justified in some (more complex, non-relational) 
settings, specially due to the presence of different kinds of logical variables 
(or lack thereof). In our context the objections do not apply: variables just 
say that a certain tuple belongs to the instance (or not); and for relational 
databases the closed-world assumption applies. In the definition of 

HP-causality is slightly modified. In our setting, this modified definition does 
not change actual causes or their properties. 


8.3 Open queries 

We have limited our discussion to boolean queries. It is possible to ex¬ 
tend our work to consider conjunctive queries with free variables, e.g. Q{x): 
3yz{R{x, y) A S{y, z)). In this case, a query answer would be of the form (a), 
for a a constant, and causes would be found for such an answer. In this case. 
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the associated DC would be of the form ^ R{a, y),S{y, z), and the rest 
would be basically as above. 


8.4 ASP specification of causes 

S-repairs can be specified by means of answer set programs (ASPs) [SUH], and 
C-repairs too, with the use of weak program constraints [5]. This should allow 
for the introduction of ASPs in the context of causality, for specification and 
reasoning. There are also ASP-based specifications of diagnosis [24] that could 
be brought into a more complete picture. 


8.5 Causes and functional dependencies, and beyond 

Functional dependencies are DCs with conjunctive violation views with in¬ 
equality, and are still monotonic. There is much research on repairs and con¬ 
sistent query answering for functional dependencies, and more complex in¬ 
tegrity constraints |5]. In causality, mostly CQs without built-ins have been 
considered. The repair connection could be exploited to obtain more refined 
results for causality and CQs with inequality, and also other classes of queries, 
even non-monotonic ones, that correspond violation views for other kinds of 
integrity constraints. In a different, but related direction, causality for mono¬ 
tonic queries in the presence of integrity constraints has been investigated in 

m- 


8.6 View updates and abduction 

Abduction pnii^ is another form of model-based diagnosis, and is related 
to the subjects investigated in this work. The view update problem, about 
updating a database through views, is a classical problem in databases that has 
been treated through abduction [371I1I]. User knowledge imposed through view 
updates creates or reflects uncertainty about the base data, because alternative 
base instances may give an account of the intended view updates. The view 
update problem, specially in its particular form of deletion propagation, has 
been recently related in mmn to causality as introduced in [57]. (Notice only 
tuple deletions are used with violation views and repairs associated to DCs.) 

Database repairs are also related to the view update problem. Actually, an¬ 
swer set programs (ASP) for database repairs [6] implicity repair the database 
by updating intentional, annotated predicates (cf. Section 18.411 . Even more, 
in [5], in order to protect sensitive information, databases are explicitly and 
virtually “repaired” through secrecy views that specify the information that 
has to be kept secret. These are prioritized repairs that have been specified 
via ASPs. Abduction has been explicitly applied to database repairs [5]. 
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The deep interrelations between causality, abductive reasoning, view up¬ 
dates and repairs are the objects of our ongoing research efforts [iniEZ]. 

To conclude, let us emphasize that in this research we have unveiled and 
formalized some first interesting relationships between causality in databases, 
database repairs, and consistency-based diagnosis. These connections allow 
us to apply results and techniques developed for each of them to the others. 
This is particularly beneficial for causality in databases, where still a limited 
number of results and techniques have been obtained or developed. 

The connections we established here inspired complexity results for causal¬ 
ity, e.g. Theorems[2]and|ni and were used to prove them. We appealed to several 
non-trivial results found in [43] (and the proofs thereof found in [44]) about 
repairs and CQA. It is also the case that the well-established hitting-set ap¬ 
proach to diagnosis inspired a similar approach to causal responsibility, which 
in its turn allowed us to obtain results about its fixed-parameter tractability. 
It is also the case that diagnostic reasoning, as a form of non-monotonic rea¬ 
soning, can provide a solid foundation for causality in databases and query 
answer explanation, in general [T61IT7] . 

In ongoing research we have established connections between query answer 
causality, abductive diagnosis and database updates through views [S^- It is 
interesting that several of these areas of data management and knowledge 
representation, including those considered in this work, fall under what has 
been called “reverse data management” tasks [l^ . Our work establishes formal 
connections between them and sets the ground for further investigation into 
their interrelationships. 
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